thr3ads.net - R help - [R] Merge data frames but prefer values in one [Sep 2009]

If this information is useful, please help other people find it:
Share via:

JiHO

2009-Sep-10 15:21 UTC

[R] Merge data frames but prefer values in one

Hello everyone,

My problem is better explained with an example:

 > x=data.frame(a=1:4,b=1:4,c=rnorm(4))
 > x
  a b          c
1 1 1 -0.8821089
2 2 2 -0.7082583
3 3 3 -0.5948835
4 4 4 -1.8571443
 > y=data.frame(a=c(1,3),b=3,c=rnorm(2))
 > y
  a b            c
1 1 3 -0.273155973
2 3 3  0.009517862

Now I want to merge x and y by columns a and b, hence creating a  
data.frame with all a:b combinations observed in x and y. That's  
easily done with merge:

 > merge(x,y,by=c("a","b"),all=T)
  a b        c.x          c.y
1 1 1 -0.8821089           NA
2 1 3         NA -0.273155973
3 2 2 -0.7082583           NA
4 3 3 -0.5948835  0.009517862
5 4 4 -1.8571443           NA

But rather than two c columns I would want the merge to:
- keep the value in x if there is no corresponding value in y
- keep the value in y if there is no corresponding value in x
- prefer the value in y when the a:b combination exists in both x and y

So basically I want my result to look like:
  a b          c
1 1 1 -0.8821089
2 1 3 -0.2731559
3 2 2 -0.7082583
4 3 3  0.0095178
5 4 4 -1.8571443

I can't find a combinations of options for merge that does this. Is  
there another fonction that would do that or do I have to resort to  
some post-processing after merge? It seems that it might be something  
like a "right merge" for data bases but I don't know this world at
all. I would be happy to look into sqldf if that allows to do things  
like that.

Thanks in advance. Sincerely,

JiHO
---
http://maururu.net

Henrique Dallazuanna

2009-Sep-10 17:20 UTC

head link

[R] Merge data frames but prefer values in one

Try this:

xy <- merge(x, y, by = c("a","b"),all = TRUE)
xy$c <- ifelse(rowSums(!is.na(.x <- xy[, c('c.x',
'c.y')])) > 1, .x[,1],
rowSums(.x, na.rm = TRUE))
xy

On Thu, Sep 10, 2009 at 12:21 PM, JiHO <jo.lists@gmail.com> wrote:
> Hello everyone,
>
> My problem is better explained with an example:
>
> > x=data.frame(a=1:4,b=1:4,c=rnorm(4))
> > x
>  a b          c
> 1 1 1 -0.8821089
> 2 2 2 -0.7082583
> 3 3 3 -0.5948835
> 4 4 4 -1.8571443
> > y=data.frame(a=c(1,3),b=3,c=rnorm(2))
> > y
>  a b            c
> 1 1 3 -0.273155973
> 2 3 3  0.009517862
>
> Now I want to merge x and y by columns a and b, hence creating a data.frame
> with all a:b combinations observed in x and y. That's easily done with
> merge:
>
> > merge(x,y,by=c("a","b"),all=T)
>  a b        c.x          c.y
> 1 1 1 -0.8821089           NA
> 2 1 3         NA -0.273155973
> 3 2 2 -0.7082583           NA
> 4 3 3 -0.5948835  0.009517862
> 5 4 4 -1.8571443           NA
>
> But rather than two c columns I would want the merge to:
> - keep the value in x if there is no corresponding value in y
> - keep the value in y if there is no corresponding value in x
> - prefer the value in y when the a:b combination exists in both x and y
>
> So basically I want my result to look like:
>  a b          c
> 1 1 1 -0.8821089
> 2 1 3 -0.2731559
> 3 2 2 -0.7082583
> 4 3 3  0.0095178
> 5 4 4 -1.8571443
>
> I can't find a combinations of options for merge that does this. Is
there
> another fonction that would do that or do I have to resort to some
> post-processing after merge? It seems that it might be something like a
> "right merge" for data bases but I don't know this world at
all. I would be
> happy to look into sqldf if that allows to do things like that.
>
> Thanks in advance. Sincerely,
>
> JiHO
> ---
> http://maururu.net
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Nandi

2009-Sep-14 19:00 UTC

head link

[R] Merge data frames but prefer values in on

No you cannot. You may want to write a merge function with the special
capability but there is no better way than the one suggested by
Henrique.

On Sep 14, 12:18?pm, JiHO <jo.li... at gmail.com>
wrote:> On 2009-September-11 ?, at 13:55 , ?wrote:
>
> > Maybe:
>
> > do.call(rbind, lapply(with(xy <- rbind(x, y), split(xy, list(a, b),
?
> > drop = TRUE)), tail, 1))
>
> > On Fri, Sep 11, 2009 at 3:45 AM, jo <jo.li... at gmail.com>
wrote:
> > Thanks for the post-processing ideas. But is there any way to do that
> > in one step?
>
> Thanks but by "in one step" I meant within the merge, not in one
post-
> processing step ;)
>
> JiHO
> ---http://maururu.net
>
> ______________________________________________
> R-h... at r-project.org mailing
listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Sep 2009 - Merge data frames but prefer values in one

[R] Merge data frames but prefer values in one

[R] Merge data frames but prefer values in one

[R] Merge data frames but prefer values in on

Possibly Parallel Threads