Wolfram Fischer
2006-Jun-14 14:55 UTC
[R] merge dataframes with conditions formulated as logical expressions
I have a data.frame df containing two variables: GRP: Factor VAL: num I have a data.frame dp containing: GRP: Factor MIN.VAL: num MAX.VAL: num VAL2: num with several rows per "GRP" where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"] within the same "GRP". I want to create df[i, "VAL2"] <- dpp[z, "VAL2"] with i along df and dpp <- subset( dp, GRP = df[i, "GRP"] ) so that it is true for each i: df[i, "VAL"] > dpp[z, "MIN.VAL"] and df[i, "VAL"] <= dpp[z, "MAX.VAL"] Is there an easy/efficient way to do that? Example: df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) ) dp <- data.frame( GRP=c( "A", "A", "B", "B" ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) The result should be: df$VAL2 <- c( 1.1, 2.2, 4.4 ) Thanks - Wolfram
Adaikalavan Ramasamy
2006-Jun-14 16:17 UTC
[R] merge dataframes with conditions formulated as logical expressions
You have discontinuity between your MIN.VAL and MAX.VAL for a given group. If this is true in practise, then you may want to check and report when VAL is in the discontinuous region. Here is my solution that ignores that (and only uses MIN.VAL and completely disrespecting MAX.VAL). Not very elegant but should do the trick. df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) ) dp <- data.frame( GRP=c( "A", "A", "B", "B" ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) x <- split(df, df$GRP) y <- split(dp, dp$GRP) out <- NULL for(g in names(x)){ xx <- x[[g]] yy <- y[[g]] w <- cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F) tmp <- cbind(xx, yy[w, "VAL2"]) colnames(tmp) <- c("GRP", "VAL", "VAL2") out <- rbind(out, tmp) } out Regards, Adai On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:> I have a data.frame df containing two variables: > GRP: Factor > VAL: num > > I have a data.frame dp containing: > GRP: Factor > MIN.VAL: num > MAX.VAL: num > VAL2: num > with several rows per "GRP" > where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"] > within the same "GRP". > > I want to create df[i, "VAL2"] <- dpp[z, "VAL2"] > with i along df > and dpp <- subset( dp, GRP = df[i, "GRP"] ) > so that it is true for each i: > df[i, "VAL"] > dpp[z, "MIN.VAL"] > and df[i, "VAL"] <= dpp[z, "MAX.VAL"] > > Is there an easy/efficient way to do that? > > Example: > df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) ) > dp <- data.frame( GRP=c( "A", "A", "B", "B" ), > MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), > VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) > > The result should be: > df$VAL2 <- c( 1.1, 2.2, 4.4 ) > > Thanks - Wolfram > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Wolfram Fischer
2006-Jun-15 05:06 UTC
[R] merge dataframes with conditions formulated as logical expressions
--- Reply to: --->Date: 14.06.06 16:17 (+0000) >From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk> >Subject: Re: [R] merge dataframes with conditions formulated as logical expressions > > You have discontinuity between your MIN.VAL and MAX.VAL for a given > group. If this is true in practise, then you may want to check and > report when VAL is in the discontinuous region.Your solution without concerning discontinuity is better because it is more general.> Here is my solution that ignores that (and only uses MIN.VAL and > completely disrespecting MAX.VAL). Not very elegant but should do > the trick. > > > df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) ) > dp <- data.frame( GRP=c( "A", "A", "B", "B" ), MIN.VAL=c( 1, 50, 1, > 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) > > x <- split(df, df$GRP) > y <- split(dp, dp$GRP) > > out <- NULL > for(g in names(x)){ > > xx <- x[[g]] > yy <- y[[g]] > > w <- cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F) > tmp <- cbind(xx, yy[w, "VAL2"]) > colnames(tmp) <- c("GRP", "VAL", "VAL2") > out <- rbind(out, tmp) > } > out > > Regards, AdaiThanks for this solution. I did not yet try to program a conventional solution because I thought there would be a nice shortcut in R to solve the problem comparably elegantly as in SQL: select df.*, dp.VAL2 from df, dp where df.GRP = dp.GRP and df.VAL > dp.MIN_VAL and df.VAL <= dp.MAX_VAL Wolfram> On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote: > > I have a data.frame df containing two variables: > > GRP: Factor > > VAL: num > > > > I have a data.frame dp containing: > > GRP: Factor > > MIN.VAL: num > > MAX.VAL: num > > VAL2: num > > with several rows per "GRP" > > where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"] > > within the same "GRP". > > > > I want to create df[i, "VAL2"] <- dpp[z, "VAL2"] > > with i along df > > and dpp <- subset( dp, GRP = df[i, "GRP"] ) > > so that it is true for each i: > > df[i, "VAL"] > dpp[z, "MIN.VAL"] > > and df[i, "VAL"] <= dpp[z, "MAX.VAL"] > > > > Is there an easy/efficient way to do that? > > > > Example: > > df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) ) > > dp <- data.frame( GRP=c( "A", "A", "B", "B" ), > > MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), > > VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) > > > > The result should be: > > df$VAL2 <- c( 1.1, 2.2, 4.4 ) > > > > Thanks - Wolfram > > > > ______________________________________________