thr3ads.net - R help - [R] merge dataframes with conditions formulated as logical expressions [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Wolfram Fischer

2006-Jun-14 14:55 UTC

[R] merge dataframes with conditions formulated as logical expressions

I have a data.frame df containing two variables:
    GRP: Factor
    VAL: num

I have a data.frame dp containing:
    GRP: Factor
    MIN.VAL: num
    MAX.VAL: num
    VAL2: num
with several rows per "GRP"
where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"]
within the same "GRP".

I want to create df[i, "VAL2"] <- dpp[z, "VAL2"] 
with    i along df 
and     dpp <- subset( dp, GRP = df[i, "GRP"] )
so that it is true for each i:
        df[i, "VAL"] > dpp[z, "MIN.VAL"]
   and  df[i, "VAL"] <= dpp[z, "MAX.VAL"]

Is there an easy/efficient way to do that?

Example:
df <- data.frame( GRP=c( "A", "A", "B" ),
VAL=c( 10, 100, 200 ) )
dp <- data.frame( GRP=c( "A", "A", "B",
"B" ),
    MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
    VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )

The result should be:
    df$VAL2 <- c( 1.1, 2.2, 4.4 )

Thanks - Wolfram

Adaikalavan Ramasamy

2006-Jun-14 16:17 UTC

head link

[R] merge dataframes with conditions formulated as logical expressions

You have discontinuity between your MIN.VAL and MAX.VAL for a given
group. If this is true in practise, then you may want to check and
report when VAL is in the discontinuous region.

Here is my solution that ignores that (and only uses MIN.VAL and
completely disrespecting MAX.VAL). Not very elegant but should do the
trick.


 df <- data.frame( GRP=c( "A", "A", "B" ),
VAL=c( 10, 100, 200 ) )
 dp <- data.frame( GRP=c( "A", "A", "B",
"B" ), MIN.VAL=c( 1, 50, 1,
70 ), MAX.VAL=c( 49, 999, 59, 999 ),  VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )

 x <- split(df, df$GRP)
 y <- split(dp, dp$GRP)

 out <- NULL
 for(g in names(x)){

   xx <- x[[g]]
   yy <- y[[g]]

   w   <- cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
   tmp <- cbind(xx, yy[w, "VAL2"])
   colnames(tmp) <- c("GRP", "VAL", "VAL2")
   out <- rbind(out, tmp)
 } 
 out

Regards, Adai



On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer
wrote:> I have a data.frame df containing two variables:
>     GRP: Factor
>     VAL: num
> 
> I have a data.frame dp containing:
>     GRP: Factor
>     MIN.VAL: num
>     MAX.VAL: num
>     VAL2: num
> with several rows per "GRP"
> where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"]
> within the same "GRP".
> 
> I want to create df[i, "VAL2"] <- dpp[z, "VAL2"] 
> with    i along df 
> and     dpp <- subset( dp, GRP = df[i, "GRP"] )
> so that it is true for each i:
>         df[i, "VAL"] > dpp[z, "MIN.VAL"]
>    and  df[i, "VAL"] <= dpp[z, "MAX.VAL"]
> 
> Is there an easy/efficient way to do that?
> 
> Example:
> df <- data.frame( GRP=c( "A", "A", "B" ),
VAL=c( 10, 100, 200 ) )
> dp <- data.frame( GRP=c( "A", "A", "B",
"B" ),
>     MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
>     VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
> 
> The result should be:
>     df$VAL2 <- c( 1.1, 2.2, 4.4 )
> 
> Thanks - Wolfram
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Wolfram Fischer

2006-Jun-15 05:06 UTC

head link

[R] merge dataframes with conditions formulated as logical expressions

--- Reply to: --->Date:    14.06.06 16:17 (+0000)
>From:    Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
>Subject: Re: [R] merge dataframes with conditions formulated as logical
expressions
>
> You have discontinuity between your MIN.VAL and MAX.VAL for a given
> group. If this is true in practise, then you may want to check and
> report when VAL is in the discontinuous region.
Your solution without concerning discontinuity is better
because it is more general.
> Here is my solution that ignores that (and only uses MIN.VAL and
> completely disrespecting MAX.VAL). Not very elegant but should do
> the trick.
> 
> 
>  df <- data.frame( GRP=c( "A", "A", "B" ),
VAL=c( 10, 100, 200 ) )
>  dp <- data.frame( GRP=c( "A", "A", "B",
"B" ), MIN.VAL=c( 1, 50, 1,
> 70 ), MAX.VAL=c( 49, 999, 59, 999 ),  VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
> 
>  x <- split(df, df$GRP)
>  y <- split(dp, dp$GRP)
> 
>  out <- NULL
>  for(g in names(x)){
> 
>    xx <- x[[g]]
>    yy <- y[[g]]
> 
>    w   <- cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
>    tmp <- cbind(xx, yy[w, "VAL2"])
>    colnames(tmp) <- c("GRP", "VAL",
"VAL2")
>    out <- rbind(out, tmp)
>  } 
>  out
> 
> Regards, Adai
Thanks for this solution.

I did not yet try to program a conventional solution
because I thought there would be a nice shortcut in R
to solve the problem comparably elegantly as in SQL:
	select df.*, dp.VAL2
	from df, dp
	where df.GRP = dp.GRP
	  and df.VAL > dp.MIN_VAL
	  and df.VAL <= dp.MAX_VAL

Wolfram
> On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:
> > I have a data.frame df containing two variables:
> >     GRP: Factor
> >     VAL: num
> > 
> > I have a data.frame dp containing:
> >     GRP: Factor
> >     MIN.VAL: num
> >     MAX.VAL: num
> >     VAL2: num
> > with several rows per "GRP"
> > where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"]
> > within the same "GRP".
> > 
> > I want to create df[i, "VAL2"] <- dpp[z,
"VAL2"]
> > with    i along df 
> > and     dpp <- subset( dp, GRP = df[i, "GRP"] )
> > so that it is true for each i:
> >         df[i, "VAL"] > dpp[z, "MIN.VAL"]
> >    and  df[i, "VAL"] <= dpp[z, "MAX.VAL"]
> > 
> > Is there an easy/efficient way to do that?
> > 
> > Example:
> > df <- data.frame( GRP=c( "A", "A",
"B" ), VAL=c( 10, 100, 200 ) )
> > dp <- data.frame( GRP=c( "A", "A",
"B", "B" ),
> >     MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
> >     VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
> > 
> > The result should be:
> >     df$VAL2 <- c( 1.1, 2.2, 4.4 )
> > 
> > Thanks - Wolfram
> > 
> > ______________________________________________

R help - Jun 2006 - merge dataframes with conditions formulated as logical expressions

[R] merge dataframes with conditions formulated as logical expressions

[R] merge dataframes with conditions formulated as logical expressions

[R] merge dataframes with conditions formulated as logical expressions