Dear List, I often have to merge two or more data frames containing unique row names but with some columns (names) common to the two data frames and some columns not common. This toy example will explain the kind of setup I am talking about: mat1 <- as.data.frame(matrix(rnorm(20), nrow = 5)) mat2 <- as.data.frame(matrix(rnorm(20), nrow = 4)) rownames(mat1) <- paste("site", 1:5, sep="") rownames(mat2) <- paste("site", 6:9, sep="") names(mat1) <- paste("species", c(1,3,5,7), sep="") names(mat2) <- paste("species", c(2,3,4,7,9), sep="") mat1 mat2 So sites (rows) are unique across both data frames, but there are only 7 unique species (columns): unique(c(names(mat1), names(mat2))) merge(mat1, mat2, all = TRUE) gives almost what I want, but it drops or looses the rownames() information from the two merged data frames, and it re-orders the rows so that one simply cannot write back the correct row names. How might I go about merging two data frames as I have described, but preserve the row names and more importantly, keep the order of rows the same, so that rows from mat1 come before the rows of mat2? Many thanks, Gavin -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
On Thu, 3 Nov 2005, Gavin Simpson wrote:> Dear List, > > I often have to merge two or more data frames containing unique row > names but with some columns (names) common to the two data frames and > some columns not common. This toy example will explain the kind of setup > I am talking about: > > mat1 <- as.data.frame(matrix(rnorm(20), nrow = 5)) > mat2 <- as.data.frame(matrix(rnorm(20), nrow = 4)) > rownames(mat1) <- paste("site", 1:5, sep="") > rownames(mat2) <- paste("site", 6:9, sep="") > names(mat1) <- paste("species", c(1,3,5,7), sep="") > names(mat2) <- paste("species", c(2,3,4,7,9), sep="") > mat1 > mat2 > > So sites (rows) are unique across both data frames, but there are only 7 > unique species (columns): > > unique(c(names(mat1), names(mat2))) > > merge(mat1, mat2, all = TRUE) > > gives almost what I want, but it drops or looses the rownames() > information from the two merged data frames, and it re-orders the rows > so that one simply cannot write back the correct row names. > > How might I go about merging two data frames as I have described, but > preserve the row names and more importantly, keep the order of rows the > same, so that rows from mat1 come before the rows of mat2?merge(mat1, mat2, all = TRUE, sort=FALSE) seems to fix the second question. The first is mentioned tangentially in the help page details if you are merging on row names, which you are not - maybe prepend to both a column called sites: mat1a <- data.frame(sites=row.names(mat1), mat1) mat2a <- data.frame(sites=row.names(mat2), mat2) data.frame(merge(mat1a, mat2a, all = TRUE, sort=FALSE), row.names="sites") is a bit long-winded, but gets you there. Roger> > Many thanks, > > Gavin >-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
you could use something like: mat1$id1 <- 1:nrow(mat1) mat2$id2 <- 1:nrow(mat2) out <- merge(mat1, mat2, all = TRUE) out[order(out$id1, out$id2), ] I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Gavin Simpson" <gavin.simpson at ucl.ac.uk> To: "R-help" <R-help at stat.math.ethz.ch> Sent: Thursday, November 03, 2005 2:08 PM Subject: [R] merging dataframes> Dear List, > > I often have to merge two or more data frames containing unique row > names but with some columns (names) common to the two data frames > and > some columns not common. This toy example will explain the kind of > setup > I am talking about: > > mat1 <- as.data.frame(matrix(rnorm(20), nrow = 5)) > mat2 <- as.data.frame(matrix(rnorm(20), nrow = 4)) > rownames(mat1) <- paste("site", 1:5, sep="") > rownames(mat2) <- paste("site", 6:9, sep="") > names(mat1) <- paste("species", c(1,3,5,7), sep="") > names(mat2) <- paste("species", c(2,3,4,7,9), sep="") > mat1 > mat2 > > So sites (rows) are unique across both data frames, but there are > only 7 > unique species (columns): > > unique(c(names(mat1), names(mat2))) > > merge(mat1, mat2, all = TRUE) > > gives almost what I want, but it drops or looses the rownames() > information from the two merged data frames, and it re-orders the > rows > so that one simply cannot write back the correct row names. > > How might I go about merging two data frames as I have described, > but > preserve the row names and more importantly, keep the order of rows > the > same, so that rows from mat1 come before the rows of mat2? > > Many thanks, > > Gavin > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Gavin Simpson [T] +44 (0)20 7679 5522 > ENSIS Research Fellow [F] +44 (0)20 7679 7565 > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > UCL Department of Geography [W] > http://www.ucl.ac.uk/~ucfagls/cv/ > 26 Bedford Way [W] http://www.ucl.ac.uk/~ucfagls/ > London. WC1H 0AP. > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
The `Value' section of ?merge does say that `... in all cases the result has no special row names', so you're left to handle that on your own. One possibility is to use result <- merge(mat1, mat2, all=TRUE, sort=FALSE) so that the sorting is not done, then you can just do rownames(result) <- c(rownames(mat1), rownames(mat2)) Cheers, Andy> From: Gavin Simpson > > Dear List, > > I often have to merge two or more data frames containing unique row > names but with some columns (names) common to the two data frames and > some columns not common. This toy example will explain the > kind of setup > I am talking about: > > mat1 <- as.data.frame(matrix(rnorm(20), nrow = 5)) > mat2 <- as.data.frame(matrix(rnorm(20), nrow = 4)) > rownames(mat1) <- paste("site", 1:5, sep="") > rownames(mat2) <- paste("site", 6:9, sep="") > names(mat1) <- paste("species", c(1,3,5,7), sep="") > names(mat2) <- paste("species", c(2,3,4,7,9), sep="") > mat1 > mat2 > > So sites (rows) are unique across both data frames, but there > are only 7 > unique species (columns): > > unique(c(names(mat1), names(mat2))) > > merge(mat1, mat2, all = TRUE) > > gives almost what I want, but it drops or looses the rownames() > information from the two merged data frames, and it re-orders the rows > so that one simply cannot write back the correct row names. > > How might I go about merging two data frames as I have described, but > preserve the row names and more importantly, keep the order > of rows the > same, so that rows from mat1 come before the rows of mat2? > > Many thanks, > > Gavin > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~ > %~%~%~%~% > Gavin Simpson [T] +44 (0)20 7679 5522 > ENSIS Research Fellow [F] +44 (0)20 7679 7565 > ENSIS Ltd. & ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk > UCL Department of Geography [W] > http://www.ucl.ac.uk/~ucfagls/cv/ > 26 Bedford Way > [W] http://www.ucl.ac.uk/~ucfagls/ > London. WC1H 0AP. > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~ > %~%~%~%~% > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Hi, what about padding both datasets with dummy missing records ... and then play with cbind and rbind ... like e.g.> species5<-c(NA,NA,NA,NA)> modmat2<-cbind(mat2,species1,species5)and then similarly with mat1 ... e.g. species2<-c(NA,NA,NA,NA,NA)> modmad1<-cbind(mat1,species2,species4,species9)> rbind(modmad1,modmat2)species1 species3 species5 species7 species2 species4 species9 site1 -0.7190044 -0.52482580 -1.1813567 -1.5584831 NA NA NA site2 -1.1782180 1.72337964 0.1652343 -0.9026087 NA NA NA site3 0.3823015 -0.07226644 -1.2907470 -0.3692091 NA NA NA site4 -1.3051131 -0.61107947 0.6264416 1.5259373 NA NA NA site5 0.2028565 -1.28374638 1.6284780 -1.2975163 NA NA NA site6 NA 1.19088414 NA 0.3159949 -0.1624538 0.5987733 0.2205512 site7 NA 0.75292176 NA 1.7524988 0.8335334 -0.7998774 -0.9788762 site8 NA -0.47803396 NA -1.3041628 1.7925165 -0.4153879 -0.4708165 site9 NA -0.20063523 NA 1.8119115 1.5351801 -1.3334419 0.5812675 it will need modifications of course if you are working with several datasets Saludos, Manuel