thr3ads.net - R help - [R] weird merge() [Jan 2013]

If this information is useful, please help other people find it:
Share via:

array chip

2013-Jan-11 18:50 UTC

[R] weird merge()

Hi,

I have some protein array data, each array in a separate text file. So I read
them in and try to combine them into a single data frame by using merge(). see
code below (If you download the attached data files into a specific folder, the
code below should work):


fls<-list.files("C:\\folder_of_download",full.names=T) ## get file
names
prot<-list() ## a list to contain individual files
ind<-1
for (i in fls[c(1:11)]) {
??? cat(ind, " ")
??? 
??? tmp<-read.delim(i,header=T,row.names=NULL,na.string='null')
??? colnames(tmp)[4]<-as.character(tmp$barcode[1])
??? prot[[ind]]<-tmp[,-(1:2)]
??? ind<-ind+1
}

??? ??? ## try to merge them together
??? ??? ## not do this in a loop so I can see where the problem occurs
pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)


I noticed that starting file #8, the merge become more and more slower that when
it's file #11, the computer was stuck!? Originally I thought something wrong
with the later files, but when I change the order of merging, the slow-down
still happens at the 8th files to be merged.

Can anyone suggest what's going on with merging?

Thanks

John
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p1.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0022.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p2.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0023.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p3.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0024.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p4.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0025.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p5.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0026.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p6.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0027.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p7.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0028.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p8.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0029.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p9.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0030.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p10.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0031.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: p11.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20130111/7f91db84/attachment-0032.txt>

J Toll

2013-Jan-11 21:35 UTC

head link

[R] weird merge()

On Fri, Jan 11, 2013 at 12:50 PM, array chip <arrayprofile@yahoo.com>
wrote:
> Hi,
>
> I have some protein array data, each array in a separate text file. So I
> read them in and try to combine them into a single data frame by using
> merge(). see code below (If you download the attached data files into a
> specific folder, the code below should work):
>
>
> fls<-list.files("C:\\folder_of_download",full.names=T) ## get
file names
> prot<-list() ## a list to contain individual files
> ind<-1
> for (i in fls[c(1:11)]) {
>     cat(ind, " ")
>
>     tmp<-read.delim(i,header=T,row.names=NULL,na.string='null')
>     colnames(tmp)[4]<-as.character(tmp$barcode[1])
>     prot[[ind]]<-tmp[,-(1:2)]
>     ind<-ind+1
> }
>
>         ## try to merge them together
>         ## not do this in a loop so I can see where the problem occurs
> pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)
>
>
> I noticed that starting file #8, the merge become more and more slower
> that when it's file #11, the computer was stuck!  Originally I thought
> something wrong with the later files, but when I change the order of
> merging, the slow-down still happens at the 8th files to be merged.
>
> Can anyone suggest what's going on with merging?
>
>

I'm not sure exactly what you're trying to do with all that code, but if
you're just trying to get all eleven files into a data.frame, you could do
this:

allFilesAsList <- lapply(1:11, function(i) read.delim(paste("p", i,
".txt",
sep = "")))
oneBigDataFrame <- do.call(rbind, allFilesAsList)

You may need to fix the column names.  Is that anything like what you were
trying to do?

James

	[[alternative HTML version deleted]]

array chip

2013-Jan-11 21:36 UTC

head link

[R] weird merge()

Hi Dennis,

Actually, I am trying to combine them by COLUMN, so 
that's why I am using merge(). The first loop was to simply read these 
protein data into R as 11 data frames, each data frame is 165 x 2. Then I
 use merge() to combine these data frames into 1 big data frame by 
column with these individual merge() statements. I didn't do it in a 
loop because I want to see at what point the merge() will start to fail.
 And it turns out the merge of the first 7 data frames is working fine. 
Starting from the 8th column, it becomes more and more slow and at the 
11th data frame it appeared stuck on my computer.

Thanks

John






________________________________
 From: Dennis Murphy <djmuser@gmail.com>

Sent: Friday, January 11, 2013 1:25 PM
Subject: Re: [R] weird merge()

Hi John:

This doesn't look right. What are you trying to do? [BTW, the variable
names in the attachments have spaces, so most of R's read functions
should choke on them. At the very least, replace the spaces with
underscores.]

If all you are trying to do is row concatenate them (since the two or
three I looked at appear to have the same structure), then it's as
simple as

pro <- do.call(rbind, prot)

If this is what you want along with an indicator for each data file,
then the ldply() function in the plyr package might be useful as an
alternative to do.call. It should return an additional variable .id
whose value corresponds to the number (or name) of the list component.

library(plyr)
pro2 <- ldply(prot, rbind)

If you want something different, then be more explicit about what you
want, because your merge() code doesn't make a lot of sense to me.


Dennis

PS: Just a little hint: if you're using (almost) the same code
repeatedly, there's probably a more efficient way to do it in R. CS
types call this the DRY principle: Don't Repeat Yourself. I know you
know this, but a little reminder doesn't hurt :)


> Hi,
>
> I have some protein array data, each array in a separate text file. So I
read them in and try to combine them into a single data frame by using merge().
see code below (If you download the attached data files into a specific folder,
the code below should work):
>
>
> fls<-list.files("C:\\folder_of_download",full.names=T) ## get
file names
> prot<-list() ## a list to contain individual files
> ind<-1
> for (i in fls[c(1:11)]) {
>     cat(ind, " ")
>
>     tmp<-read.delim(i,header=T,row.names=NULL,na.string='null')
>     colnames(tmp)[4]<-as.character(tmp$barcode[1])
>     prot[[ind]]<-tmp[,-(1:2)]
>     ind<-ind+1
> }
>
>         ## try to merge them together
>         ## not do this in a loop so I can see where the problem occurs
> pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)
>
>
> I noticed that starting file #8, the merge become more and more slower that
when it's file #11, the computer was stuck!  Originally I thought something
wrong with the later files, but when I change the order of merging, the
slow-down still happens at the 8th files to be merged.
>
> Can anyone suggest what's going on with merging?
>
> Thanks
>
> John
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jan 2013 - weird merge()

[R] weird merge()

[R] weird merge()

[R] weird merge()

Possibly Parallel Threads