Dear R-ers, I feel I am close, but can't get it quite right. Thanks a lot for your help! Dimitri # I have 2 data frames: x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) (x);(y) # I'd like to merge them so that the result looks like this: desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) (desired) # In other words, I want column e1 and e2 entries from data frame y to be repeated based on matching of column a from x and columns a2 and then a3 from y. # I am trying step-by-step - first I am using column a2 from data frame y for merging: out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) (out1) # looking good - half of the job is done # Step2 - does not work # next line produces columns e1 and e2 twice (in real life I have tons of columns like e1 and e2): merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) # next line also doesn't do the job: merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) # Finally, I tried this approach: out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) (out1); (out2) # Now I need to merge these 2 - however, the next line doubles the number of entries: merge(out1,out2,by=names(out1),all.x=T,all.y=T) -- Dimitri Liakhovitski marketfusionanalytics.com
Hi Dimitri, Try creating a key for "x" and "y" and then merging the result by that variable: x$key <- with(x, paste(a, b, sep = "/")) y$key <- with(y, paste(a2, b, sep = "/")) merge(x, y, by = 'key')[, c(2:4, 8:9)] HTH, Jorge.- On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:> Dear R-ers, > > I feel I am close, but can't get it quite right. > Thanks a lot for your help! > > Dimitri > > # I have 2 data frames: > > > x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) > > y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) > (x);(y) > > # I'd like to merge them so that the result looks like this: > > > desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), > > e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) > (desired) > > # In other words, I want column e1 and e2 entries from data frame y to > be repeated based on matching of column a from x and columns a2 and > then a3 from y. > > # I am trying step-by-step - first I am using column a2 from data > frame y for merging: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > (out1) # looking good - half of the job is done > > # Step2 - does not work > > # next line produces columns e1 and e2 twice (in real life I have tons > of columns like e1 and e2): > merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > > # next line also doesn't do the job: > > merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) > > # Finally, I tried this approach: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > (out1); (out2) > > # Now I need to merge these 2 - however, the next line doubles the > number of entries: > merge(out1,out2,by=names(out1),all.x=T,all.y=T) > > -- > Dimitri Liakhovitski > marketfusionanalytics.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hello, About many columns like 'e1' and 'e2' I don't know but with the provided example the following does NOT depend on them, only on 'a', 'b' and 'a2' and 'a3'. z <- lapply(c("a2", "a3"), function(cc) merge(x, y, by.x=c("a", "b"), by.y=c(cc, "b"))) z <- lapply(seq_along(z), function(i) z[[i]][ -which(names(z[[i]]) %in% c("a2", "a3")) ]) z <- do.call(rbind, z) z <- z[order(z$a, z$b), ] rownames(z) <- seq_len(nrow(z)) all.equal(desired, z) Hope this helps, Rui Barradas Em 11-07-2012 23:28, Dimitri Liakhovitski escreveu:> Dear R-ers, > > I feel I am close, but can't get it quite right. > Thanks a lot for your help! > > Dimitri > > # I have 2 data frames: > > x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80)) > y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401)) > (x);(y) > > # I'd like to merge them so that the result looks like this: > > desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80), > e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401)) > (desired) > > # In other words, I want column e1 and e2 entries from data frame y to > be repeated based on matching of column a from x and columns a2 and > then a3 from y. > > # I am trying step-by-step - first I am using column a2 from data > frame y for merging: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > (out1) # looking good - half of the job is done > > # Step2 - does not work > > # next line produces columns e1 and e2 twice (in real life I have tons > of columns like e1 and e2): > merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > > # next line also doesn't do the job: > merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F) > > # Finally, I tried this approach: > out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F) > out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F) > (out1); (out2) > > # Now I need to merge these 2 - however, the next line doubles the > number of entries: > merge(out1,out2,by=names(out1),all.x=T,all.y=T) >
Possibly Parallel Threads
- help please: put output into dataframe
- [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros
- [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros
- [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros
- [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros