thr3ads.net - R help - [R] performance issue with merge [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Maxim

2010-Dec-20 19:32 UTC

[R] performance issue with merge

Hi,

I'm trying to merge a couple of matrices together. The matrices are created
from a couple of input files that have very similar (but not in all cases)
values in first and second column. 1st and 2nd column are used to generate
rownames, the actually interesting value is in column 3. I'd like to merge
the different files by the rownames generated from 1st and 2nd column. Files
look like this:

ID1  100  3.14
ID1  200  2.71
ID1  300  0.92
...
ID2 100  2.45
.....


etc. (sorry, I do not know how to create such files/ matrices with R
commands).

Some files do not have the full range of values in the second column, but I
don't want to loose any values during the merge. As far as I understood it I
have to merge using the all.x and all.y directives which appears to be
related to the outer join of relational dbs. This would give me something
like:

ID1_100  3.14  1.56  3.45
ID1_200  2.71   NA   1.34
ID1_300  0.92   1.22  NA
...


Merging works as long I do not set the all.x and all.y directive in the
merge command:


DF <- data.frame(NULL)

count=0

filenames <- list.files()

filenames

for(i in filenames) {

count<-count+1

tmp <- read.delim(i, header=FALSE)

rwnames <- paste(tmp[,1], tmp[,2], sep="_")

 tmp<-tmp[,3]

tmp<-as.matrix(tmp)

rownames(tmp)<-rwnames

if (count == 1) {

DF <- tmp

 } else {

DF <- merge(DF, tmp, by="row.names",all.x = TRUE, all.y = TRUE)

rownames(DF)<-DF$Row.names

DF<-DF[,2:ncol(DF)]

}

}


As soon as I set the all directives the script runs forever without any
effect (files sizes: a couple of million lines per file). Is it expected,
that this type of merge takes so much longer (I think it never finishes!)?
Or do I have a conceptual problem with how merge works?


Maxim


P.S.: I'd be really happy in case I receive comments how to make my easy
parsing problem a bit more straight forward in terms of how the code looks
like!

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Dec 2010 - performance issue with merge

[R] performance issue with merge

Possibly Parallel Threads

Wisdom of the Ancients