thr3ads.net - R help - [R] help with merging 2 data frames [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Dimitri Liakhovitski

2012-Jul-11 22:28 UTC

[R] help with merging 2 data frames

Dear R-ers,

I feel I am close, but can't get it quite right.
Thanks a lot for your help!

Dimitri

# I have 2 data frames:

x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
(x);(y)

# I'd like to merge them so that the result looks like this:

desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
	e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
(desired)

# In other words, I want column e1 and e2 entries from data frame y to
be repeated based on matching of column a from x and columns a2 and
then a3 from y.

# I am trying step-by-step - first I am using column a2 from data
frame y for merging:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
(out1)   # looking good - half of the job is done

# Step2 - does not work

# next line produces columns e1 and e2 twice (in real life I have tons
of columns like e1 and e2):
merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)

# next line also doesn't do the job:
merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)

# Finally, I tried this approach:
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
(out1); (out2)

# Now I need to merge these 2 - however, the next line doubles the
number of entries:
merge(out1,out2,by=names(out1),all.x=T,all.y=T)

-- 
Dimitri Liakhovitski
marketfusionanalytics.com

Jorge I Velez

2012-Jul-11 22:36 UTC

head link

[R] help with merging 2 data frames

Hi Dimitri,

Try creating a key for "x" and "y" and then merging the
result by that
variable:

x$key <- with(x, paste(a, b, sep = "/"))
y$key <- with(y, paste(a2, b, sep = "/"))
merge(x, y, by = 'key')[, c(2:4, 8:9)]

HTH,
Jorge.-


On Wed, Jul 11, 2012 at 6:28 PM, Dimitri Liakhovitski <> wrote:
> Dear R-ers,
>
> I feel I am close, but can't get it quite right.
> Thanks a lot for your help!
>
> Dimitri
>
> # I have 2 data frames:
>
>
>
x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
>
>
y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
> (x);(y)
>
> # I'd like to merge them so that the result looks like this:
>
>
>
desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
>
>
e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
> (desired)
>
> # In other words, I want column e1 and e2 entries from data frame y to
> be repeated based on matching of column a from x and columns a2 and
> then a3 from y.
>
> # I am trying step-by-step - first I am using column a2 from data
> frame y for merging:
>
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
> (out1)   # looking good - half of the job is done
>
> # Step2 - does not work
>
> # next line produces columns e1 and e2 twice (in real life I have tons
> of columns like e1 and e2):
>
merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>
> # next line also doesn't do the job:
>
>
merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>
> # Finally, I tried this approach:
>
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>
out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
> (out1); (out2)
>
> # Now I need to merge these 2 - however, the next line doubles the
> number of entries:
> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>
> --
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Rui Barradas

2012-Jul-11 23:08 UTC

head link

[R] help with merging 2 data frames

Hello,

About many columns like 'e1' and 'e2' I don't know but with
the provided
example the following does NOT depend on them, only on 'a', 'b'
and 'a2'
and 'a3'.


z <- lapply(c("a2", "a3"), function(cc) merge(x, y,
by.x=c("a", "b"),
by.y=c(cc, "b")))
z <- lapply(seq_along(z), function(i)
         z[[i]][ -which(names(z[[i]]) %in% c("a2", "a3")) ])
z <- do.call(rbind, z)
z <- z[order(z$a, z$b), ]
rownames(z) <- seq_len(nrow(z))
all.equal(desired, z)


Hope this helps,

Rui Barradas

Em 11-07-2012 23:28, Dimitri Liakhovitski escreveu:> Dear R-ers,
>
> I feel I am close, but can't get it quite right.
> Thanks a lot for your help!
>
> Dimitri
>
> # I have 2 data frames:
>
>
x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))
>
y<-data.frame(a2=c("aa","aa","ba","ba"),a3=c("ab","ab","bb","bb"),b=c(1:2,1:2),e1=c(100,200,300,400),e2=c(101,201,301,401))
> (x);(y)
>
> # I'd like to merge them so that the result looks like this:
>
>
desired<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80),
> 
e1=c(100,200,100,200,300,400,300,400),e2=c(101,201,101,201,301,401,301,401))
> (desired)
>
> # In other words, I want column e1 and e2 entries from data frame y to
> be repeated based on matching of column a from x and columns a2 and
> then a3 from y.
>
> # I am trying step-by-step - first I am using column a2 from data
> frame y for merging:
>
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
> (out1)   # looking good - half of the job is done
>
> # Step2 - does not work
>
> # next line produces columns e1 and e2 twice (in real life I have tons
> of columns like e1 and e2):
>
merge(out1,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
>
> # next line also doesn't do the job:
>
merge(out1,y[-1],by.x=c("a","b","e1","e2"),by.y=c("a3","b","e1","e2"),all.x=T,all.y=F)
>
> # Finally, I tried this approach:
>
out1<-merge(x,y[-2],by.x=c("a","b"),by.y=c("a2","b"),all.x=T,all.y=F)
>
out2<-merge(x,y[-1],by.x=c("a","b"),by.y=c("a3","b"),all.x=T,all.y=F)
> (out1); (out2)
>
> # Now I need to merge these 2 - however, the next line doubles the
> number of entries:
> merge(out1,out2,by=names(out1),all.x=T,all.y=T)
>

Reasonably Related Threads

Search for more seemingly similar threads

R help - Jul 2012 - help with merging 2 data frames

[R] help with merging 2 data frames

[R] help with merging 2 data frames

[R] help with merging 2 data frames

Reasonably Related Threads