Hello, I am having some problems to use the 'merge' function. I'm not sure if I got its working right. What I want to do is: 1) Suppose I have a dataframe like: height width 1 1.1 2.3 2 2.1 2.5 3 1.8 1.9 4 1.6 2.1 5 1.8 2.4 2) And I generate a second dataframe sampled from this one, like: height width 1 1.1 2.3 3 1.8 1.9 5 1.8 2.4 3) Next, I add a new variable from this dataframe: height width color 1 1.1 2.3 red 3 1.8 1.9 red 5 1.8 2.4 blue 4) So, I want to merge those dataframes, so that the new variable, color, is binded to the first dataframe. Of course some cases won't have value for it, since I generated this variable in a smaller dataframe. In those cases I want the value to be NA. The result dataframe should be: height width color 1 1.1 2.3 red 2 2.1 2.5 NA 3 1.8 1.9 red 4 1.6 2.1 NA 5 1.8 2.4 blue I have written some codes, but they're not working properly. The new variable has its values mixed up, and they do not correspond to its row.names. # Generate the first dataframe data1 <- data.frame(height=rnorm(20,3,0.2),width=rnorm(20,2,0.5)) # Sample a smaller dataframe from data1 data2 <- data1[sample(1:20,15,replace=F),] # Generate the new variable color <- sample(c("red","blue"),15,replace=T) # Bind the new variable to data2 data2 <- cbind(data2, color) # Merge the data1 and data2$color by row.names, and force it to has the same values that data1. Next it generates a new dataframe where column 1 is the row.name, and then sort it by the row.name from data1. data.frame(merge(data1,data2$color, by=0, all.x=T),row.names=1)[row.names(data1),] I'm not sure what am I doing wrong. Can anyone see where the mistake is? Thank you! Cheers, Joao D. -- View this message in context: http://r.789695.n4.nabble.com/Merge-dataframes-tp3882222p3882222.html Sent from the R help mailing list archive at Nabble.com.
On 07.10.2011 15:34, jdanielnd wrote:> Hello, > > I am having some problems to use the 'merge' function. I'm not sure if I got > its working right. > > What I want to do is: > > 1) Suppose I have a dataframe like: > > height width > 1 1.1 2.3 > 2 2.1 2.5 > 3 1.8 1.9 > 4 1.6 2.1 > 5 1.8 2.4 > > 2) And I generate a second dataframe sampled from this one, like: > > height width > 1 1.1 2.3 > 3 1.8 1.9 > 5 1.8 2.4 > > 3) Next, I add a new variable from this dataframe: > > height width color > 1 1.1 2.3 red > 3 1.8 1.9 red > 5 1.8 2.4 blue > > 4) So, I want to merge those dataframes, so that the new variable, color, is > binded to the first dataframe. Of course some cases won't have value for it, > since I generated this variable in a smaller dataframe. In those cases I > want the value to be NA. The result dataframe should be: > > height width color > 1 1.1 2.3 red > 2 2.1 2.5 NA > 3 1.8 1.9 red > 4 1.6 2.1 NA > 5 1.8 2.4 blue > > I have written some codes, but they're not working properly. The new > variable has its values mixed up, and they do not correspond to its > row.names. > > # Generate the first dataframe > data1<- data.frame(height=rnorm(20,3,0.2),width=rnorm(20,2,0.5)) > # Sample a smaller dataframe from data1 > data2<- data1[sample(1:20,15,replace=F),] > # Generate the new variable > color<- sample(c("red","blue"),15,replace=T) > # Bind the new variable to data2 > data2<- cbind(data2, color) > # Merge the data1 and data2$color by row.names, and force it to has the same > values that data1. Next it generates a new dataframe where column 1 is the > row.name, and then sort it by the row.name from data1. > data.frame(merge(data1,data2$color, by=0, > all.x=T),row.names=1)[row.names(data1),] > > I'm not sure what am I doing wrong. Can anyone see where the mistake is?Just let merge do the work and prepend the rownames: cbind(names = rownames(data1), merge(data1, data2, all.x = TRUE, sort=FALSE)) Uwe Ligges> > Thank you! > > Cheers, > > Joao D. > > -- > View this message in context: http://r.789695.n4.nabble.com/Merge-dataframes-tp3882222p3882222.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Oct 7, 2011, at 9:34 AM, jdanielnd wrote:> Hello, > > I am having some problems to use the 'merge' function. I'm not sure > if I got > its working right. > > What I want to do is: > > 1) Suppose I have a dataframe like: > > height width > 1 1.1 2.3 > 2 2.1 2.5 > 3 1.8 1.9 > 4 1.6 2.1 > 5 1.8 2.4 > > 2) And I generate a second dataframe sampled from this one, like: > > height width > 1 1.1 2.3 > 3 1.8 1.9 > 5 1.8 2.4 > > 3) Next, I add a new variable from this dataframe: > > height width color > 1 1.1 2.3 red > 3 1.8 1.9 red > 5 1.8 2.4 blue > > 4) So, I want to merge those dataframes, so that the new variable, > color, is > binded to the first dataframe. Of course some cases won't have value > for it, > since I generated this variable in a smaller dataframe. In those > cases I > want the value to be NA. The result dataframe should be: > > height width color > 1 1.1 2.3 red > 2 2.1 2.5 NA > 3 1.8 1.9 red > 4 1.6 2.1 NA > 5 1.8 2.4 blue > > I have written some codes, but they're not working properly. The new > variable has its values mixed up, and they do not correspond to its > row.names. > > # Generate the first dataframe > data1 <- data.frame(height=rnorm(20,3,0.2),width=rnorm(20,2,0.5)) > # Sample a smaller dataframe from data1 > data2 <- data1[sample(1:20,15,replace=F),] > # Generate the new variable > color <- sample(c("red","blue"),15,replace=T) > # Bind the new variable to data2 > data2 <- cbind(data2, color) > # Merge the data1 and data2$color by row.names, and force it to has > the same > values that data1. Next it generates a new dataframe where column 1 > is the > row.name, and then sort it by the row.name from data1. > data.frame(merge(data1,data2$color, by=0, > all.x=T),row.names=1)[row.names(data1),] > > I'm not sure what am I doing wrong.I'm not sure what you want. You get the rownames with this: > str( merge( data1, data2$color, by=0, all.x=T) ) 'data.frame': 20 obs. of 4 variables: $ Row.names:Class 'AsIs' chr [1:20] "1" "10" "11" "12" ... $ height : num 3.02 2.9 2.93 2.87 2.95 ... $ width : num 1.7 1.85 1.51 2.14 2.22 ... $ y : Factor w/ 2 levels "blue","red": 1 2 1 2 1 1 1 NA NA NA ... If all you want is the original order then just resort: newdat <- merge( data1, data2$color, by=0, all.x=T) newdat[order(newdat$Row.names), ] I checked to see if the Row.names were correct by also examining merge( cbind(rownames(data1), data1), data2$color, by=0, all.x=T)> Can anyone see where the mistake is? > > Thank you! > > Cheers, > > Joao D. > > -- > View this message in context: http://r.789695.n4.nabble.com/Merge-dataframes-tp3882222p3882222.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT