Hi,
I'm trying to merge a couple of matrices together. The matrices are created
from a couple of input files that have very similar (but not in all cases)
values in first and second column. 1st and 2nd column are used to generate
rownames, the actually interesting value is in column 3. I'd like to merge
the different files by the rownames generated from 1st and 2nd column. Files
look like this:
ID1 100 3.14
ID1 200 2.71
ID1 300 0.92
...
ID2 100 2.45
.....
etc. (sorry, I do not know how to create such files/ matrices with R
commands).
Some files do not have the full range of values in the second column, but I
don't want to loose any values during the merge. As far as I understood it I
have to merge using the all.x and all.y directives which appears to be
related to the outer join of relational dbs. This would give me something
like:
ID1_100 3.14 1.56 3.45
ID1_200 2.71 NA 1.34
ID1_300 0.92 1.22 NA
...
Merging works as long I do not set the all.x and all.y directive in the
merge command:
DF <- data.frame(NULL)
count=0
filenames <- list.files()
filenames
for(i in filenames) {
count<-count+1
tmp <- read.delim(i, header=FALSE)
rwnames <- paste(tmp[,1], tmp[,2], sep="_")
tmp<-tmp[,3]
tmp<-as.matrix(tmp)
rownames(tmp)<-rwnames
if (count == 1) {
DF <- tmp
} else {
DF <- merge(DF, tmp, by="row.names",all.x = TRUE, all.y = TRUE)
rownames(DF)<-DF$Row.names
DF<-DF[,2:ncol(DF)]
}
}
As soon as I set the all directives the script runs forever without any
effect (files sizes: a couple of million lines per file). Is it expected,
that this type of merge takes so much longer (I think it never finishes!)?
Or do I have a conceptual problem with how merge works?
Maxim
P.S.: I'd be really happy in case I receive comments how to make my easy
parsing problem a bit more straight forward in terms of how the code looks
like!
[[alternative HTML version deleted]]