Hi, I have some protein array data, each array in a separate text file. So I read them in and try to combine them into a single data frame by using merge(). see code below (If you download the attached data files into a specific folder, the code below should work): fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names prot<-list() ## a list to contain individual files ind<-1 for (i in fls[c(1:11)]) { ??? cat(ind, " ") ??? ??? tmp<-read.delim(i,header=T,row.names=NULL,na.string='null') ??? colnames(tmp)[4]<-as.character(tmp$barcode[1]) ??? prot[[ind]]<-tmp[,-(1:2)] ??? ind<-ind+1 } ??? ??? ## try to merge them together ??? ??? ## not do this in a loop so I can see where the problem occurs pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) I noticed that starting file #8, the merge become more and more slower that when it's file #11, the computer was stuck!? Originally I thought something wrong with the later files, but when I change the order of merging, the slow-down still happens at the 8th files to be merged. Can anyone suggest what's going on with merging? Thanks John -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p1.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0022.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p2.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0023.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p3.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0024.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p4.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0025.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p5.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0026.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p6.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0027.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p7.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0028.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p8.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0029.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p9.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0030.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p10.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0031.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p11.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0032.txt>
On Fri, Jan 11, 2013 at 12:50 PM, array chip <arrayprofile@yahoo.com> wrote:> Hi, > > I have some protein array data, each array in a separate text file. So I > read them in and try to combine them into a single data frame by using > merge(). see code below (If you download the attached data files into a > specific folder, the code below should work): > > > fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names > prot<-list() ## a list to contain individual files > ind<-1 > for (i in fls[c(1:11)]) { > cat(ind, " ") > > tmp<-read.delim(i,header=T,row.names=NULL,na.string='null') > colnames(tmp)[4]<-as.character(tmp$barcode[1]) > prot[[ind]]<-tmp[,-(1:2)] > ind<-ind+1 > } > > ## try to merge them together > ## not do this in a loop so I can see where the problem occurs > pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) > > > I noticed that starting file #8, the merge become more and more slower > that when it's file #11, the computer was stuck! Originally I thought > something wrong with the later files, but when I change the order of > merging, the slow-down still happens at the 8th files to be merged. > > Can anyone suggest what's going on with merging? > >I'm not sure exactly what you're trying to do with all that code, but if you're just trying to get all eleven files into a data.frame, you could do this: allFilesAsList <- lapply(1:11, function(i) read.delim(paste("p", i, ".txt", sep = ""))) oneBigDataFrame <- do.call(rbind, allFilesAsList) You may need to fix the column names. Is that anything like what you were trying to do? James [[alternative HTML version deleted]]
Hi Dennis, Actually, I am trying to combine them by COLUMN, so that's why I am using merge(). The first loop was to simply read these protein data into R as 11 data frames, each data frame is 165 x 2. Then I use merge() to combine these data frames into 1 big data frame by column with these individual merge() statements. I didn't do it in a loop because I want to see at what point the merge() will start to fail. And it turns out the merge of the first 7 data frames is working fine. Starting from the 8th column, it becomes more and more slow and at the 11th data frame it appeared stuck on my computer. Thanks John ________________________________ From: Dennis Murphy <djmuser@gmail.com> Sent: Friday, January 11, 2013 1:25 PM Subject: Re: [R] weird merge() Hi John: This doesn't look right. What are you trying to do? [BTW, the variable names in the attachments have spaces, so most of R's read functions should choke on them. At the very least, replace the spaces with underscores.] If all you are trying to do is row concatenate them (since the two or three I looked at appear to have the same structure), then it's as simple as pro <- do.call(rbind, prot) If this is what you want along with an indicator for each data file, then the ldply() function in the plyr package might be useful as an alternative to do.call. It should return an additional variable .id whose value corresponds to the number (or name) of the list component. library(plyr) pro2 <- ldply(prot, rbind) If you want something different, then be more explicit about what you want, because your merge() code doesn't make a lot of sense to me. Dennis PS: Just a little hint: if you're using (almost) the same code repeatedly, there's probably a more efficient way to do it in R. CS types call this the DRY principle: Don't Repeat Yourself. I know you know this, but a little reminder doesn't hurt :)> Hi, > > I have some protein array data, each array in a separate text file. So I read them in and try to combine them into a single data frame by using merge(). see code below (If you download the attached data files into a specific folder, the code below should work): > > > fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names > prot<-list() ## a list to contain individual files > ind<-1 > for (i in fls[c(1:11)]) { > cat(ind, " ") > > tmp<-read.delim(i,header=T,row.names=NULL,na.string='null') > colnames(tmp)[4]<-as.character(tmp$barcode[1]) > prot[[ind]]<-tmp[,-(1:2)] > ind<-ind+1 > } > > ## try to merge them together > ## not do this in a loop so I can see where the problem occurs > pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) > > > I noticed that starting file #8, the merge become more and more slower that when it's file #11, the computer was stuck! Originally I thought something wrong with the later files, but when I change the order of merging, the slow-down still happens at the 8th files to be merged. > > Can anyone suggest what's going on with merging? > > Thanks > > John > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]