Pete Pete
2010-Dec-07 16:30 UTC
[R] Creating binary variable depending on strings of two dataframes
Hi, consider the following two dataframes: x1=c("232","3454","3455","342","13") x2=c("1","1","1","0","0") data1=data.frame(x1,x2) y1=c("232","232","3454","3454","3455","342","13","13","13","13") y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2") data2=data.frame(y1,y2) I need a new column in dataframe data1 (x3), which is either 0 or 1 depending if the value "E1" in y2 of data2 is true while x1=y1. The result of data1 should look like this: x1 x2 x3 1 232 1 1 2 3454 1 1 3 3455 1 0 4 342 0 0 5 13 0 1 I think a SQL command could help me but I am too inexperienced with it to get there. Thanks for your help! -- View this message in context: http://r.789695.n4.nabble.com/Creating-binary-variable-depending-on-strings-of-two-dataframes-tp3076724p3076724.html Sent from the R help mailing list archive at Nabble.com.
Santosh Srinivas
2010-Dec-07 16:58 UTC
[R] Creating binary variable depending on strings of two dataframes
your question is not clear to me .. but your solution is a variation of> data1$x.1 <- data1$x1 %in% data2$y1you can play with your conditions to get the result you want On Tue, Dec 7, 2010 at 10:00 PM, Pete Pete <noxyport at gmail.com> wrote:> > Hi, > consider the following two dataframes: > x1=c("232","3454","3455","342","13") > x2=c("1","1","1","0","0") > data1=data.frame(x1,x2) > > y1=c("232","232","3454","3454","3455","342","13","13","13","13") > y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2") > data2=data.frame(y1,y2) > > I need a new column in dataframe data1 (x3), which is either 0 or 1 > depending if the value "E1" in y2 of data2 is true while x1=y1. The result > of data1 should look like this: > ? x1 ? ? x2 x3 > 1 232 ? 1 ? 1 > 2 3454 1 ? 1 > 3 3455 1 ? 0 > 4 342 ? 0 ? 0 > 5 13 ? ? 0 ? 1 > > I think a SQL command could help me but I am too inexperienced with it to > get there. > > Thanks for your help! > > -- > View this message in context: http://r.789695.n4.nabble.com/Creating-binary-variable-depending-on-strings-of-two-dataframes-tp3076724p3076724.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gabor Grothendieck
2010-Dec-07 17:11 UTC
[R] Creating binary variable depending on strings of two dataframes
On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyport at gmail.com> wrote:> > Hi, > consider the following two dataframes: > x1=c("232","3454","3455","342","13") > x2=c("1","1","1","0","0") > data1=data.frame(x1,x2) > > y1=c("232","232","3454","3454","3455","342","13","13","13","13") > y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2") > data2=data.frame(y1,y2) > > I need a new column in dataframe data1 (x3), which is either 0 or 1 > depending if the value "E1" in y2 of data2 is true while x1=y1. The result > of data1 should look like this: > ? x1 ? ? x2 x3 > 1 232 ? 1 ? 1 > 2 3454 1 ? 1 > 3 3455 1 ? 0 > 4 342 ? 0 ? 0 > 5 13 ? ? 0 ? 1 > > I think a SQL command could help me but I am too inexperienced with it to > get there. >Try this:> library(sqldf) > sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2 d2 on (x1 = y1) group by x1, x2 order by d1.rowid")x1 x2 x3 1 232 1 1 2 3454 1 1 3 3455 1 0 4 342 0 0 5 13 0 1 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
David Winsemius
2010-Dec-07 17:30 UTC
[R] Creating binary variable depending on strings of two dataframes
On Dec 7, 2010, at 11:30 AM, Pete Pete wrote:> > Hi, > consider the following two dataframes: > x1=c("232","3454","3455","342","13") > x2=c("1","1","1","0","0") > data1=data.frame(x1,x2) > > y1=c("232","232","3454","3454","3455","342","13","13","13","13") > y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2") > data2=data.frame(y1,y2) > > I need a new column in dataframe data1 (x3), which is either 0 or 1 > depending if the value "E1" in y2 of data2 is true while x1=y1. The > result > of data1 should look like this: > x1 x2 x3 > 1 232 1 1 > 2 3454 1 1 > 3 3455 1 0 > 4 342 0 0 > 5 13 0 1 > > I think a SQL command could help me but I am too inexperienced with > it to > get there.> dat3 <- merge(data1, data2[data2$y2=="E1", ], by.x="x1", by.y="y1", all.x=TRUE) > dat3$y2 <- 0 + (dat3$y2 %in% "E1") > dat3 x1 x2 y2 1 13 0 1 2 232 1 1 3 342 0 0 4 3454 1 1 5 3455 1 0 (Admittedly not in the original order, but in my hands the R merge operation doesn't lend itself well to maintaining the original order. I see that Grothendieck's solution is better in this respect, a typical occurrence in comparison of our respective efforts with R.) -- David Winsemius, MD West Hartford, CT
noxyport at gmail.com
2011-May-10 13:49 UTC
[R] Creating binary variable depending on strings of two dataframes
On Tue, May 10, 2011 at 3:09 PM, David Winsemius <dwinsemius@comcast.net> wrote:> > On May 10, 2011, at 3:18 AM, noxyport@gmail.com wrote: > >> On Fri, May 6, 2011 at 7:41 PM, David Winsemius <dwinsemius@comcast.net> >> wrote: >>> >>> On May 6, 2011, at 11:35 AM, Pete Pete wrote: >>> >>>> >>>> Gabor Grothendieck wrote: >>>>> >>>>> On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyport@gmail.com> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> consider the following two dataframes: >>>>>> x1=c("232","3454","3455","342","13") >>>>>> x2=c("1","1","1","0","0") >>>>>> data1=data.frame(x1,x2) >>>>>> >>>>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13") >>>>>> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2") >>>>>> data2=data.frame(y1,y2) >>>>>> >>>>>> I need a new column in dataframe data1 (x3), which is either 0 or 1 >>>>>> depending if the value "E1" in y2 of data2 is true while x1=y1. The >>>>>> result >>>>>> of data1 should look like this: >>>>>> x1 x2 x3 >>>>>> 1 232 1 1 >>>>>> 2 3454 1 1 >>>>>> 3 3455 1 0 >>>>>> 4 342 0 0 >>>>>> 5 13 0 1 >>>>>> >>>>>> I think a SQL command could help me but I am too inexperienced withit>>>>>> to >>>>>> get there. >>>>>> >>>>> >>>>> Try this: >>>>> >>>>>> library(sqldf) >>>>>> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2 >>>>>> d2 >>>>>> on (x1 = y1) group by x1, x2 order by d1.rowid") >>>>> >>>>> x1 x2 x3 >>>>> 1 232 1 1 >>>>> 2 3454 1 1 >>>>> 3 3455 1 0 >>>>> 4 342 0 0 >>>>> 5 13 0 1 >>>>> >>>>> >>> snipped Gabor's sig >>>> >>>> That works pretty cool but I need to automate this a bit more. Consider >>>> the >>>> following example: >>>> >>>> list1=c("A01","B04","A64","G84","F19") >>>> >>>> x1=c("232","3454","3455","342","13") >>>> x2=c("1","1","1","0","0") >>>> data1=data.frame(x1,x2) >>>> >>>> y1=c("232","232","3454","3454","3455","342","13","13","13","13") >>>> y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01") >>>> data2=data.frame(y1,y2) >>>> >>>> I want now to creat a loop, which creates for every value in list1 anew>>>> binary variable in data1. Result should look like: >>>> x1 x2 A01 B04 A64 G84 F19 >>>> 232 1 0 1 0 0 0 >>>> 3454 1 0 0 1 0 1 >>>> 3455 1 0 0 0 0 0 >>>> 342 0 0 0 0 0 0 >>>> 13 0 1 0 0 1 1 >>> >>> Loops!?! We don't nee no steenking loops! >>> >>>> xtb <- with(data2, table(y1,y2)) >>>> cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] ) >>> >>> x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44 >>> 232 232 1 0 0 1 1 0 0 0 0 0 >>> 3454 3454 1 0 1 0 0 0 1 0 0 0 >>> 3455 3455 1 0 0 0 0 1 0 0 0 0 >>> 342 342 0 0 0 0 0 0 0 0 0 1 >>> 13 13 0 1 0 0 0 0 1 1 1 0 >>> >>> I am guessing that you were to ... er, busy? ... to complete the table? >>> >>> -- >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> >> >> Thanks a lot! Pretty simple. I am so much used to SQLDF right now. >> >> So how would you handle more complicated strings like that: >> y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13") >> y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44 >> C35","F68","G84","F19","A01") >> data2=data.frame(y1,y2) >> >> Where you want to extract for instance all "A01" from the strings? > > I think you need either to explain what you want in more words of the > English language or to offer an example of the desired output. I suspectyou> did not want something as simple as this: > >> A01.instances <- grep("A01" , data2$y2) >> A01.instances > [1] 2 11 >> data2[A01.instances, ] > y1 y2 > 2 232 B04 A01 F19 > 11 13 A01 > > Or maybe you did? > > -- > David Winsemius, MD > West Hartford, CT > >No, that was not my intention. Consider the following example: list1=c("A01","B04","A64","G84","F19") # My "substrings" to screen for in> data2 > > > x1=c("232","3454","3455","342","13") > x2=c("1","1","1","0","0") > data1=data.frame(x1,x2) # Target dataframe where the 5 new binary variables > (namely from list1) are added > > > y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13") > y2=c("E133","B04 A01A F194","B04","F19","A642 G84 A05","E223","H44 > C35","F68","G84","F19","A01") > data2=data.frame(y1,y2) # Dataframe to be screen by list1 >Result should look like this: x1 x2 A01 B04 A64 G84 F19> 232 1 1 1 0 0 0 > 3454 1 0 0 1 0 1 > 3455 1 0 0 0 0 0 > 342 0 0 0 0 0 0 > 13 0 1 0 0 1 1 >[[alternative HTML version deleted]]