Thank you. Sorry i forgot to turn off the html Below is a sample of my data. My original data frame has over 10,000 rows. I want to check each element on my data frame column B (MyDF$B) to see if it contains any element(s) of MYList. if os, change the value of MyDF$C to the name of the vector of the list that has match(s). I solved this via loops and if statements, using &in& but I am hoping for a more compact solution using the apply family functions. I tried something like this but did not work. lapply(strsplit(MyDF$B," "),function(x) lapply(MyList,function(y) if(sum(y %in% x)>0,x$Code==y[[1]])) Thanks in advance--EK Sample data MyList <- list(X=c("a","ba","cc"),Y=c("abs","aa","BA","BB"),z=c("ab","bb","xy","zy","gh")) MyDF <- data.frame(A=c(1,2,3,4,5),B=c("aa ab ac","bb bc bd","cc cf","dd","ee"), C= c(0,0,0,0,0), stringsAsFactors = FALSE)> MyDFA B C 1 1 aa ab ac 0 2 2 bb bc bd 0 3 3 cc cf 0 4 4 dd 0 5 5 ee 0> MyList$X [1] "a" "ba" "cc" $Y [1] "abs" "aa" "BA" "BB" $z [1] "ab" "bb" "xy" "zy" "gh" Desired results.> MyDFA B C 1 1 aa ab ac Y 2 2 bb bc bd Y 3 3 cc cf X 4 4 dd 0 5 5 ee 0
I skipped pre-populating MyDF$C as unnecessary:> MyDF <- data.frame(A=c(1,2,3,4,5),B=c("aa ab ac","bb bc bd","cccf","dd","ee"), + stringsAsFactors = FALSE) ## I think this does what you want:> choices<- sapply(MyDF$B, strsplit, split = " +") > nm <- names(MyList) > MyDF$C <- nm[sapply(choices, function(x)match(TRUE,sapply(MyList,function(tbl)any(x %in% tbl))))]> MyDF$C[1] "Y" "z" "X" NA NA You could of course make this even more opaque by making it a one-liner. ;-) Cheers, Bert On Sat, Apr 6, 2019 at 10:45 AM Ek Esawi <esawiek at gmail.com> wrote:> Thank you. Sorry i forgot to turn off the html > > Below is a sample of my data. My original data frame has over 10,000 rows. > I want to check each element on my data frame column B > (MyDF$B) to see if it contains any element(s) of MYList. if os, change > the value of MyDF$C to the name of the vector of the list that has > match(s). > > I solved this via loops and if statements, using &in& but I am hoping for > a more compact solution using the apply family functions. I tried > something like > this but did not work. > > lapply(strsplit(MyDF$B," "),function(x) lapply(MyList,function(y) if(sum(y > %in% x)>0,x$Code==y[[1]])) > > Thanks in advance--EK > > Sample data > MyList <- > list(X=c("a","ba","cc"),Y=c("abs","aa","BA","BB"),z=c("ab","bb","xy","zy","gh")) > MyDF <- data.frame(A=c(1,2,3,4,5),B=c("aa ab ac","bb bc bd","cc > cf","dd","ee"), C= c(0,0,0,0,0), stringsAsFactors = FALSE) > > > MyDF > > A B C > 1 1 aa ab ac 0 > 2 2 bb bc bd 0 > 3 3 cc cf 0 > 4 4 dd 0 > 5 5 ee 0 > > > MyList > > $X > [1] "a" "ba" "cc" > > $Y > [1] "abs" "aa" "BA" "BB" > > $z > [1] "ab" "bb" "xy" "zy" "gh" > > > Desired results. > > > MyDF > > A B C > 1 1 aa ab ac Y > 2 2 bb bc bd Y > 3 3 cc cf X > 4 4 dd 0 > 5 5 ee 0 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thank you Bert. It did indeed work and i put it in one line as well. The match(TRUE..) through me off a little on the beginning, but i realized why it's there. Thanks for your continuous comments on mine and many other posts. EK On Sat, Apr 6, 2019 at 5:07 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > I skipped pre-populating MyDF$C as unnecessary: > > > MyDF <- data.frame(A=c(1,2,3,4,5),B=c("aa ab ac","bb bc bd","cc cf","dd","ee"), > + stringsAsFactors = FALSE) > > ## I think this does what you want: > > > choices<- sapply(MyDF$B, strsplit, split = " +") > > nm <- names(MyList) > > MyDF$C <- nm[sapply(choices, function(x)match(TRUE, sapply(MyList,function(tbl)any(x %in% tbl))))] > > MyDF$C > [1] "Y" "z" "X" NA NA > > You could of course make this even more opaque by making it a one-liner. ;-) > > Cheers, > Bert > > > > On Sat, Apr 6, 2019 at 10:45 AM Ek Esawi <esawiek at gmail.com> wrote: >> >> Thank you. Sorry i forgot to turn off the html >> >> Below is a sample of my data. My original data frame has over 10,000 rows. >> I want to check each element on my data frame column B >> (MyDF$B) to see if it contains any element(s) of MYList. if os, change >> the value of MyDF$C to the name of the vector of the list that has >> match(s). >> >> I solved this via loops and if statements, using &in& but I am hoping for >> a more compact solution using the apply family functions. I tried something like >> this but did not work. >> >> lapply(strsplit(MyDF$B," "),function(x) lapply(MyList,function(y) if(sum(y >> %in% x)>0,x$Code==y[[1]])) >> >> Thanks in advance--EK >> >> Sample data >> MyList <- list(X=c("a","ba","cc"),Y=c("abs","aa","BA","BB"),z=c("ab","bb","xy","zy","gh")) >> MyDF <- data.frame(A=c(1,2,3,4,5),B=c("aa ab ac","bb bc bd","cc >> cf","dd","ee"), C= c(0,0,0,0,0), stringsAsFactors = FALSE) >> >> > MyDF >> >> A B C >> 1 1 aa ab ac 0 >> 2 2 bb bc bd 0 >> 3 3 cc cf 0 >> 4 4 dd 0 >> 5 5 ee 0 >> >> > MyList >> >> $X >> [1] "a" "ba" "cc" >> >> $Y >> [1] "abs" "aa" "BA" "BB" >> >> $z >> [1] "ab" "bb" "xy" "zy" "gh" >> >> >> Desired results. >> >> > MyDF >> >> A B C >> 1 1 aa ab ac Y >> 2 2 bb bc bd Y >> 3 3 cc cf X >> 4 4 dd 0 >> 5 5 ee 0 >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.