Lopez, Dan
2013-Nov-12 01:04 UTC
[R] Update a variable in a dataframe based on variables in another dataframe of a different size
Below is how I am currently doing this. Is there a more efficient way to do this? The scenario is that I have two dataframes of different sizes. I need to update one binary factor variable in one of those dataframes by matching on two variables. If there is no match keep as is otherwise update. Also the variable being update, TT in this case should remain a binary factor variable (levels='HC','TER') HTDF2<-merge(H_DF,T_DF,by=c("FY","ID"),all.x=T) HTDF2$TT<-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c("HC","TER")) HTDF2<-HTDF2[,-(3:4)] # REPRODUCIBLE EXAMPLE DATA FOR ABOVE..> dput(H_DF)structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = "factor"), ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HC", "TER"), class = "factor")), .Names = c("FY", "ID", "TT"), class = "data.frame", row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 9L, 10L, 11L))> dput(T_DF)structure(list(FY = structure(c(4L, 2L, 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = "factor"), ID = c(1, 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c("HC", "TER"), class = "factor")), .Names = c("FY", "ID", "TT"), row.names = c(5L, 8L, 12L), class = "data.frame") Dan Lopez LLNL, HRIM - Workforce Analytics & Metrics [[alternative HTML version deleted]]
Gabor Grothendieck
2013-Nov-12 01:20 UTC
[R] Update a variable in a dataframe based on variables in another dataframe of a different size
On Mon, Nov 11, 2013 at 8:04 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:> Below is how I am currently doing this. Is there a more efficient way to do this? > The scenario is that I have two dataframes of different sizes. I need to update one binary factor variable in one of those dataframes by matching on two variables. If there is no match keep as is otherwise update. Also the variable being update, TT in this case should remain a binary factor variable (levels='HC','TER') > > HTDF2<-merge(H_DF,T_DF,by=c("FY","ID"),all.x=T) > HTDF2$TT<-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c("HC","TER")) > HTDF2<-HTDF2[,-(3:4)] > > > # REPRODUCIBLE EXAMPLE DATA FOR ABOVE.. >> dput(H_DF) > structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, > 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = "factor"), > ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HC", "TER"), class = "factor")), .Names = c("FY", > "ID", "TT"), class = "data.frame", row.names = c(1L, 2L, 3L, > 4L, 6L, 7L, 9L, 10L, 11L)) >> dput(T_DF) > structure(list(FY = structure(c(4L, 2L, 5L), .Label = c("FY09", > "FY10", "FY11", "FY12", "FY13"), class = "factor"), ID = c(1, > 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c("HC", "TER"), class = "factor")), .Names = c("FY", > "ID", "TT"), row.names = c(5L, 8L, 12L), class = "data.frame") >Here is an sqldf solution:> library(sqldf) > sqldf("select FY, ID, coalesce(t.TT, h.TT) TT from H_DF h left join T_DF t using(FY, ID)")FY ID TT 1 FY09 1 HC 2 FY10 1 HC 3 FY11 1 HC 4 FY12 1 TER 5 FY09 2 HC 6 FY10 2 TER 7 FY11 2 HC 8 FY12 2 HC 9 FY13 2 TER -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Lopez, Dan
2013-Nov-13 21:29 UTC
[R] Update a variable in a dataframe based on variables in another dataframe of a different size
This is a great solution! Love the conciseness of your solution. And easy to understand. Thanks again. Dan -----Original Message----- From: arun [mailto:smartpink111 at yahoo.com] Sent: Monday, November 11, 2013 6:31 PM To: Lopez, Dan Subject: Re: [R] Update a variable in a dataframe based on variables in another dataframe of a different size Hi, You could use: H_DF[match(with(T_DF,paste(FY,ID,sep="_")), with(H_DF,paste(FY,ID,sep="_"))),3]<- "TER" A.K. On Monday, November 11, 2013 8:51 PM, "Lopez, Dan" <lopez235 at llnl.gov> wrote: Below is how I am currently doing this. Is there a more efficient way to do this? The scenario is that I have two dataframes of different sizes. I need to update one binary factor variable in one of those dataframes by matching on two variables. If there is no match keep as is otherwise update. Also the variable being update, TT in this case should remain a binary factor variable (levels='HC','TER') HTDF2<-merge(H_DF,T_DF,by=c("FY","ID"),all.x=T) HTDF2$TT<-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c("HC","TER")) HTDF2<-HTDF2[,-(3:4)] # REPRODUCIBLE EXAMPLE DATA FOR ABOVE..> dput(H_DF)structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = "factor"), ? ? ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L, ? ? 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HC", "TER"), class = "factor")), .Names = c("FY", "ID", "TT"), class = "data.frame", row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 9L, 10L, 11L))> dput(T_DF)structure(list(FY = structure(c(4L, 2L, 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = "factor"), ID = c(1, 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c("HC", "TER"), class = "factor")), .Names = c("FY", "ID", "TT"), row.names = c(5L, 8L, 12L), class = "data.frame") Dan Lopez LLNL, HRIM - Workforce Analytics & Metrics ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.