Marius Hofert
2011-Aug-18 18:41 UTC
[R] Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
Dear expeRts, What is the best approach to create a third data frame from two given ones, when the new/third data frame has last column computed from the last columns of the two given data frames? ## Okay, sounds complicated, so here is an example. Assume we have the two data frames: df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20) df2 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=21:40) ## To make this a bit more fun, let's say the order of elements is different... (df1 <- df1[sample(1:nrow(df1)),]) (df2 <- df2[sample(1:nrow(df2)),]) ## Now I would like to create a third data frame that has "Year" in column one, ## "Group" in column two, and each entry of column three should consist of the ## corresponding entry in df1 divided by the one in df2. ## To achieve this, one could do: df3 <- df1[with(df1, order(Year,Group)),] df3$Value <- df3$Value/df2[with(df2, order(Year,Group)),]$Value colnames(df3)[3] <- "New Value" # typically, the column name changes ## or one could do: df3 <- df1[with(df1, order(Year,Group)), -ncol(df1)] df3 <- cbind(df3, "New Value"=df1[with(df1, order(Year,Group)),]$Value/df2[with(df2, order(Year,Group)),]$Value) ## Is there a more elegant solution? (maybe with ddply?) ## By the way: df1[,"Value"] # works df1[,-"Value"] # does not work ## Is there a way to exclude columns by names? that would make the code more readable. ## I know one could use... subset(df1, select=c("Year","Group")) ## ... but it seems a bit tedious if you have lots of columns to first remove the ## column name that should be dropped and then put the remaining column names in "select" Cheers, Marius
Daniel Malter
2011-Aug-18 20:12 UTC
[R] Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
The "problem" with your first solution is that it relies on that the each 'year x group' combination is present in both data frames. To avoid this, I would recommend to use merge() df3<-merge(df1,df2,by.x=c("Year","Group"),by.y=c("Year","Group")) df3$ratio<-with(df3,Value.x/Value.y) df3 HTH, Daniel mhofert wrote:> > Dear expeRts, > > What is the best approach to create a third data frame from two given > ones, when > the new/third data frame has last column computed from the last columns of > the two given > data frames? > > ## Okay, sounds complicated, so here is an example. Assume we have the two > data frames: > df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group > 2"), Value=1:20) > df2 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group > 2"), Value=21:40) > > ## To make this a bit more fun, let's say the order of elements is > different... > (df1 <- df1[sample(1:nrow(df1)),]) > (df2 <- df2[sample(1:nrow(df2)),]) > > ## Now I would like to create a third data frame that has "Year" in column > one, > ## "Group" in column two, and each entry of column three should consist of > the > ## corresponding entry in df1 divided by the one in df2. > > ## To achieve this, one could do: > df3 <- df1[with(df1, order(Year,Group)),] > df3$Value <- df3$Value/df2[with(df2, order(Year,Group)),]$Value > colnames(df3)[3] <- "New Value" # typically, the column name changes > > ## or one could do: > df3 <- df1[with(df1, order(Year,Group)), -ncol(df1)] > df3 <- cbind(df3, "New Value"=df1[with(df1, > order(Year,Group)),]$Value/df2[with(df2, order(Year,Group)),]$Value) > > ## Is there a more elegant solution? (maybe with ddply?) > > ## By the way: > df1[,"Value"] # works > df1[,-"Value"] # does not work > ## Is there a way to exclude columns by names? that would make the code > more readable. > ## I know one could use... > subset(df1, select=c("Year","Group")) > ## ... but it seems a bit tedious if you have lots of columns to first > remove the > ## column name that should be dropped and then put the remaining column > names in "select" > > > Cheers, > > Marius > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/Best-way-practice-to-create-a-new-data-frame-from-two-given-ones-with-last-column-computed-from-the--tp3753311p3753558.html Sent from the R help mailing list archive at Nabble.com.
Marius Hofert
2011-Aug-18 22:00 UTC
[R] Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
Dear all, okay, I found a one liner based on mutate: (df3 <- mutate(df1, Value=Value[order(Year,Group)] / df2[with(df2, order(Year,Group)),"Value"])) Cheers, Marius On 2011-08-18, at 20:41 , Marius Hofert wrote:> Dear expeRts, > > What is the best approach to create a third data frame from two given ones, when > the new/third data frame has last column computed from the last columns of the two given > data frames? > > ## Okay, sounds complicated, so here is an example. Assume we have the two data frames: > df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20) > df2 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=21:40) > > ## To make this a bit more fun, let's say the order of elements is different... > (df1 <- df1[sample(1:nrow(df1)),]) > (df2 <- df2[sample(1:nrow(df2)),]) > > ## Now I would like to create a third data frame that has "Year" in column one, > ## "Group" in column two, and each entry of column three should consist of the > ## corresponding entry in df1 divided by the one in df2. > > ## To achieve this, one could do: > df3 <- df1[with(df1, order(Year,Group)),] > df3$Value <- df3$Value/df2[with(df2, order(Year,Group)),]$Value > colnames(df3)[3] <- "New Value" # typically, the column name changes > > ## or one could do: > df3 <- df1[with(df1, order(Year,Group)), -ncol(df1)] > df3 <- cbind(df3, "New Value"=df1[with(df1, order(Year,Group)),]$Value/df2[with(df2, order(Year,Group)),]$Value) > > ## Is there a more elegant solution? (maybe with ddply?) > > ## By the way: > df1[,"Value"] # works > df1[,-"Value"] # does not work > ## Is there a way to exclude columns by names? that would make the code more readable. > ## I know one could use... > subset(df1, select=c("Year","Group")) > ## ... but it seems a bit tedious if you have lots of columns to first remove the > ## column name that should be dropped and then put the remaining column names in "select" > > > Cheers, > > Marius > > >