Hello all, I have read as many merge issues as I possibly could tonight and although I presume this is a small error, I have not found the solution to my problem. I'm trying to merge two data sets: dat0 and TransTable. As you can see below, dat0 has 8000 rows, whereas TransTable has 47296 rows. I would expect when I merge the two data sets, with all.x=F, and all.y=F, that the intersection would yield 8000 rows, considering dat0 is a subset of TransTable. However, I get a neat little surprise when I check the dimensions of the resultant data frame - dat0merge, the merged data frame has 8007 rows! How can this be? Where did these extra 7 rows come from? This appears to defy logic! Thank you in advance for your help. I've put my code below for reference. Tova Fuller > dim(dat0) [1] 8000 60 > dim(TransTable) [1] 47296 9 > dat0merge=merge(TransTable,dat0, by.x="Target",by.y="TargetID",all.x=F,all.y=F) > dim(dat0merge) [1] 8007 68
Dear Tova There is no reason why the merged data.frame should have exactely 8000 or less rows. The "all=FALSE" options only says that now new rows are generated for cases that appear only in one of the two data.frames. Have a look at this sample> dat1 <- data.frame(a = c(1,2,3,4), b = letters[1:4]) > dat2 <- data.frame(a = c(1,2,3,4,5,6,7,8,1), b = LETTERS[1:9])> merge(dat1,dat2, by = "a", all = FALSE)1 1 a A 2 1 a I 3 2 b B 4 3 c C 5 4 d D Since "1" appears twice in the large data.frame it is repeated as the help page ?merge says: "If there is more than one match, all possible matches contribute one row each." To compare have a look what the option "all = TRUE" changes> merge(dat1,dat2, by = "a", all = TRUE)Probably in your large data frame some rows have identical target ids and get repeated. It should be easy to check it with unique() Hope this helps Christoph -------------------------------------------------------------- Credit and Surety PML study: visit our web page www.cs-pml.org -------------------------------------------------------------- Christoph Buser <buser at stat.math.ethz.ch> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -------------------------------------------------------------- Tova Fuller writes: > Hello all, > > I have read as many merge issues as I possibly could tonight and > although I presume this is a small error, I have not found the > solution to my problem. > > I'm trying to merge two data sets: dat0 and TransTable. As you can > see below, dat0 has 8000 rows, whereas TransTable has 47296 rows. I > would expect when I merge the two data sets, with all.x=F, and > all.y=F, that the intersection would yield 8000 rows, considering > dat0 is a subset of TransTable. > > However, I get a neat little surprise when I check the dimensions of > the resultant data frame - dat0merge, the merged data frame has 8007 > rows! How can this be? Where did these extra 7 rows come from? > This appears to defy logic! > > Thank you in advance for your help. I've put my code below for > reference. > > Tova Fuller > > > dim(dat0) > [1] 8000 60 > > dim(TransTable) > [1] 47296 9 > > dat0merge=merge(TransTable,dat0, > by.x="Target",by.y="TargetID",all.x=F,all.y=F) > > dim(dat0merge) > [1] 8007 68 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Fri, 22 Sep 2006, Tova Fuller wrote:> Hello all, > > I have read as many merge issues as I possibly could tonight and > although I presume this is a small error, I have not found the > solution to my problem. > > I'm trying to merge two data sets: dat0 and TransTable. As you can > see below, dat0 has 8000 rows, whereas TransTable has 47296 rows. I > would expect when I merge the two data sets, with all.x=F, and > all.y=F, that the intersection would yield 8000 rows, considering > dat0 is a subset of TransTable. > > However, I get a neat little surprise when I check the dimensions of > the resultant data frame - dat0merge, the merged data frame has 8007 > rows! How can this be? Where did these extra 7 rows come from? > This appears to defy logic!Not the help page, though. joined together. If there is more than one match, all possible matches contribute one row each. I presume you think you are doing a 1:1 match, but probably you have multiple matches for up to 7 ids. This is one of the commonest misconceptions about merge that you will find frequently in the list archives.> > Thank you in advance for your help. I've put my code below for > reference. > > Tova Fuller > > > dim(dat0) > [1] 8000 60 > > dim(TransTable) > [1] 47296 9 > > dat0merge=merge(TransTable,dat0, > by.x="Target",by.y="TargetID",all.x=F,all.y=F) > > dim(dat0merge) > [1] 8007 68 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595