hi, I have two data.frames each with two columns; x1 1 4 1 3 1 6 2 9 2 2 2 5 3 6 3 7 3 4 x2 1 -3 1 -7 2 -3 2 -2 2 -8 3 -1 3 -2 3 -1 now I want to merge this data.frames to one data.frame. The problem is, that sometimes there is a different number of elements per category. (like above x1 has 3 values for the value 1 in the first row, but x2 has only 2 values for the value 1 in the first row). Is there an easy way to merge this two data.frames by deleting the rows that only one data.frame "has". In the example, that resulting data.frame would be the data.frame x1 and x2 except the row 3 of data.frame x1. thanks for any suggestions!
Red the help page for merge: ?merge On Jun 17, 2009, at 8:33 PM, Martin Batholdy wrote:> hi, > > > I have two data.frames each with two columns; > > > x1 > > 1 4 > 1 3 > 1 6 > 2 9 > 2 2 > 2 5 > 3 6 > 3 7 > 3 4 > > > x2 > > 1 -3 > 1 -7 > 2 -3 > 2 -2 > 2 -8 > 3 -1 > 3 -2 > 3 -1 > > now I want to merge this data.frames to one data.frame. > > The problem is, that sometimes there is a different number of > elements per category. > (like above x1 has 3 values for the value 1 in the first row, but x2 > has only 2 values for the value 1 in the first row). > > Is there an easy way to merge this two data.frames by deleting the > rows that only one data.frame "has". > In the example, that resulting data.frame would be the data.frame x1 > and x2 except the row 3 of data.frame x1. > > thanks for any suggestions! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
I have tried to replicate the example on the help page; x <- data.frame(category = sample(3, 10, r=TRUE), rnorm(10, 5, 2)) y <- data.frame(category = sample(3, 10, r=TRUE), rnorm(10, 8, 2)) merge(x, y, by = "category") When I do that, I get a data.frame with 28 rows instead of 10. What am I doing wrong? Am 18.06.2009 um 02:42 schrieb David Winsemius:> Red the help page for merge: > > ?merge > > > On Jun 17, 2009, at 8:33 PM, Martin Batholdy wrote: > >> hi, >> >> >> I have two data.frames each with two columns; >> >> >> x1 >> >> 1 4 >> 1 3 >> 1 6 >> 2 9 >> 2 2 >> 2 5 >> 3 6 >> 3 7 >> 3 4 >> >> >> x2 >> >> 1 -3 >> 1 -7 >> 2 -3 >> 2 -2 >> 2 -8 >> 3 -1 >> 3 -2 >> 3 -1 >> >> now I want to merge this data.frames to one data.frame. >> >> The problem is, that sometimes there is a different number of >> elements per category. >> (like above x1 has 3 values for the value 1 in the first row, but >> x2 has only 2 values for the value 1 in the first row). >> >> Is there an easy way to merge this two data.frames by deleting the >> rows that only one data.frame "has". >> In the example, that resulting data.frame would be the data.frame >> x1 and x2 except the row 3 of data.frame x1. >> >> thanks for any suggestions! >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT >
The word "merge" in the context of R suggests the use of the merge() function, but I don't think that's the right tool for what you want. The merge() function is for relational database type merges, which for your data would have a many to many merge. Not good. In terms of the R language, you're looking for something using the cbind() function, not the merge() function (I think). There are a couple of details that need to be clarified, and my solution below made some assumptions. 1) Could a value in the first column appear in only one of the two data frames? 2) Is it always x1 that has more values (in your example, x1 had the number 1 appear three times in the first column, and x2 had it appear only twice. Does x2 sometimes have more rows? (I think your description implies that, but it's good to be explicit) I added extra rows to your example data frames to test my assumptions about the answers. After trying to be clever, I decided the easiest way is brute force. Hopefully, this is what you want: x1 <- as.data.frame( matrix( c( 1, 4, 1, 3, 1, 6, 2, 9, 2, 2, 2, 5, 3, 6, 3, 7, 3, 4, 4,0, 4,1) , byrow=TRUE,ncol=2)) x2 <- as.data.frame( matrix( c( 1, -3, 1, -7, 2, -3, 2, -2, 2, -8, 3, -1, 3, -2, 3, -1, 4,0, 4,1, 4,2, 4,3) , byrow=TRUE,ncol=2)) ### ivals <- sort(unique(c(x1$V1,x2$V1))) for (i in ivals) { tmpx1 <- x1[x1$V1 == i , ] tmpx2 <- x2[x2$V1 == i , ] n.to.use <- min( nrow(tmpx1), nrow(tmpx2)) if (n.to.use >= 1 ) { rtmp <- seq(n.to.use) tmpnew <- cbind( tmpx1[rtmp, ], V3=tmpx2[rtmp,'V2']) if (i==min(ivals)) { newx <- tmpnew } else { newx <- rbind( newx, tmpnew) } } else next } The loop could be written with fewer lines of code, but I found it easier to read and understand this way. If x1 and x2 have a very large number of rows, the above should probably be revised for better memory usage. -Don At 2:33 AM +0200 6/18/09, Martin Batholdy wrote:>hi, > > >I have two data.frames each with two columns; > >x1 > >1 4 >1 3 >1 6 >2 9 >2 2 >2 5 >3 6 >3 7 >3 4 > > >x2 > >1 -3 >1 -7 >2 -3 >2 -2 >2 -8 >3 -1 >3 -2 >3 -1 > >now I want to merge this data.frames to one data.frame. > >The problem is, that sometimes there is a different number of >elements per category. >(like above x1 has 3 values for the value 1 in the first row, but x2 >has only 2 values for the value 1 in the first row). > >Is there an easy way to merge this two data.frames by deleting the >rows that only one data.frame "has". >In the example, that resulting data.frame would be the data.frame x1 >and x2 except the row 3 of data.frame x1. > >thanks for any suggestions! > >______________________________________________ >R-help at r-project.org mailing list >https:// stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062
Reasonably Related Threads
- correct my method of estimating mean of two POSIXlt data frames
- Surprising message "Error in FUN(newX[, i], ...) : all arguments must have the same length"
- Surprising message "Error in FUN(newX[, i], ...) : all arguments must have the same length"
- Compile Packages on vista
- change location of temporary files