Hello again... sorry to be posting yet again, but I hadn't anticipated this problem. I am trying to now put the names found in one column in data frame 1 (lets call it df.1[,1]) in to a list from the rows where the values in df.1[,2] match values in a column of another dataframe (df.2[3]) I tried to write this function so that it put the list of names (called Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think its too complex for a beginner R-enthusiast ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL} Iffy<-apply( df.1, 1, FUN=ify, x=df.1, y=df.2, a=2, b=3, c=1 ) But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s) (newX[, i]) Here is a dataset that replicates the problem, you'll notice the "h" criteria values are different between the two dataframes and therefore it would produce a list of the 9 letters where the two criteria columns matched (a,b,c,d,e,f,g,i,j): df.1<-data.frame(rep(letters[1:10])) colnames(df.1)[1]<-("Letters") set.seed(1) df.1$numb1<-rnorm(10,1,1) df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10) df.1$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") df.1 df.2<-data.frame(rep(letters[1:10])) colnames(df.2)[1]<-("Letters") set.seed(1) df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10) df.2$numb1<-rnorm(10,1,1) df.2$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") df.2[8,3]<-12 df.1 df.2 Your patience is much appreciated, Rob
R. Michael Weylandt <michael.weylandt@gmail.com>
2011-Nov-16 13:34 UTC
[R] create list of names where two df contain == values
I'm not at a computer now, so I can't take a close look at it, but I think the match() function can be helpful here. I'll try to get back to you with a fuller answer later. Michael On Nov 16, 2011, at 8:03 AM, "Rob Griffin" <robgriffin247 at hotmail.com> wrote:> Hello again... sorry to be posting yet again, but I hadn't anticipated this problem. > > I am trying to now put the names found in one column in data frame 1 (lets call it df.1[,1]) in to a list from the rows where the values in df.1[,2] match values in a column of another dataframe (df.2[3]) > I tried to write this function so that it put the list of names (called Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think its too complex for a beginner R-enthusiast > > ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL} > Iffy<-apply( df.1, 1, FUN=ify, x=df.1, y=df.2, a=2, b=3, c=1 ) > > But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s) (newX[, i]) > > > Here is a dataset that replicates the problem, you'll notice the "h" criteria values are different between the two dataframes and therefore it would produce a list of the 9 letters where the two criteria columns matched (a,b,c,d,e,f,g,i,j): > > > > df.1<-data.frame(rep(letters[1:10])) > colnames(df.1)[1]<-("Letters") > set.seed(1) > df.1$numb1<-rnorm(10,1,1) > df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10) > df.1$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") > df.1 > > df.2<-data.frame(rep(letters[1:10])) > colnames(df.2)[1]<-("Letters") > set.seed(1) > df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10) > df.2$numb1<-rnorm(10,1,1) > df.2$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") > df.2[8,3]<-12 > > df.1 > df.2 > > > > > Your patience is much appreciated, > Rob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2011-Nov-16 14:04 UTC
[R] create list of names where two df contain == values
On Nov 16, 2011, at 8:03 AM, Rob Griffin wrote:> Hello again... sorry to be posting yet again, but I hadn't > anticipated this problem. > > I am trying to now put the names found in one column in data frame 1 > (lets call it df.1[,1]) in to a list from the rows where the values > in df.1[,2] match values in a column of another dataframe (df.2[3]) > I tried to write this function so that it put the list of names > (called Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched > but I think its too complex for a beginner R-enthusiast > > ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else > {NULL}When you are building a helper function for use with apply, your should realize that tat function will be getting a vector, not a list. The construction "[[,a]]" looks pretty strange as well. Generally column selection is done with one of "[[a]]" or "[ , a]". I am not absolutely sure that you cannot have "[[,]]" but I was under the impression you could not. AND you shouldn't be retruning NULLs if what yoyr really want are NA's.> Iffy<-apply( df.1, 1, FUN=ify, x=df.1, y=df.2, a=2, b=3, > c=1 )So a single vector will be assigned to the x argument in the ify function and the rest of the arguments will be populated from the other arguments. You do NOT need to supply an "x" argument in that list and if you do so you will throw an error. Furthermore you cannot expect the apply function to keep track of which row it's one for indexing a different data.frame. The mapply function might be used for this purpose but I am going to suggest a much cleaner solution below.> > But this didn't work... Error in FUN(newX[, i], ...) : unused > argument(s) (newX[, i]) > > > Here is a dataset that replicates the problem, you'll notice the "h" > criteria values are different between the two dataframes and > therefore it would produce a list of the 9 letters where the two > criteria columns matched (a,b,c,d,e,f,g,i,j):If you know that df.1 and df.2 have the same number of rows then use the ifelse function which is designed to work on vectors. The if)_else construct is NOT: > ifelse( df.1[,2] ==df.2[,3], {as.character(df.1[,1])} , {NA} ) [1] "a" "b" "c" "d" "e" "f" "g" NA "i" "j" The reason as.character was needed lies in that fact that you constructed df.1[,1] as a factor variable. AS I understand it, the ifelse tries to make it numeric to match the datatype of the comaprison. I've never understood this frankly. Maybe someoen can educate me. If you wanted a function that allowed you to specify the columns and dataframes then consider this ret3.m1.eq.n2 <- function(df1, df2, col1, col2, col3){ ifelse( df1[,col1] ==df2[,col2], {as.character(df1[,col3])} , {NA} )> > > > df.1<-data.frame(rep(letters[1:10])) > colnames(df.1)[1]<-("Letters") > set.seed(1) > df.1$numb1<-rnorm(10,1,1) > df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10) > df.1$id<- > c > ("CG234 > ","CG232 > ","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") > df.1 > > df.2<-data.frame(rep(letters[1:10])) > colnames(df.2)[1]<-("Letters") > set.seed(1) > df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10) > df.2$numb1<-rnorm(10,1,1) > df.2$id<- > c > ("CG234 > ","CG232 > ","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") > df.2[8,3]<-12 > > df.1 > df.2 > > > > > Your patience is much appreciated, > Rob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Dennis Murphy
2011-Nov-16 15:03 UTC
[R] create list of names where two df contain == values
Hi: I think you're overthinking this problem. As is usually the case in R, a vectorized solution is clearer and provides more easily understood code. It's not obvious to me exactly what you want, so we'll try a couple of variations on the same idea. Equality of floating point numbers is a difficult computational problem (see R FAQ 7.31), but if it makes sense to define a threshold difference between floating numbers that practically equates to zero, then you're in business. In your example, the difference in numb1 for letter h in the two data frames is far from zero, so define 'equal' to be a difference < 10 ^{-6}. Then: # Return the entire matching data frame df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, ] Letters numb1 extra.col id 1 a 0.3735462 1 CG234 2 b 1.1836433 2 CG232 3 c 0.1643714 3 CG441 4 d 2.5952808 4 CG128 5 e 1.3295078 5 CG125 6 f 0.1795316 6 CG182 7 g 1.4874291 7 CG982 9 i 1.5757814 9 CG282 10 j 0.6946116 10 CG154 # Return the matching letters only as a vector: df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, 'Letters' ] If you want the latter object to remain a data frame, use drop = FALSE as an extra argument after 'Letters'. If you want to create a list object such that each letter comprises a different list component, then the following will do - the as.character() part coerces the factor Letters into a character object: as.list(as.character(df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, 'Letters' ])) HTH, Dennis On Wed, Nov 16, 2011 at 5:03 AM, Rob Griffin <robgriffin247 at hotmail.com> wrote:> Hello again... sorry to be posting yet again, but I hadn't anticipated this > problem. > > I am trying to now put the names found in one column in data frame 1 (lets > call it df.1[,1]) in to a list from the rows where the values in df.1[,2] > match values in a column of another dataframe (df.2[3]) > I tried to write this function so that it put the list of names (called > Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think its > too complex for a beginner R-enthusiast > > ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL} > Iffy<-apply( ?df.1, ?1, ?FUN=ify, ?x=df.1, ?y=df.2, ?a=2, ?b=3, ?c=1 ?) > > But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s) > (newX[, i]) > > > Here is a dataset that replicates the problem, you'll notice the "h" > criteria values are different between the two dataframes and therefore it > would produce a list ?of the 9 letters where the two criteria columns > matched (a,b,c,d,e,f,g,i,j): > > > > df.1<-data.frame(rep(letters[1:10])) > colnames(df.1)[1]<-("Letters") > set.seed(1) > df.1$numb1<-rnorm(10,1,1) > df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10) > df.1$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") > df.1 > > df.2<-data.frame(rep(letters[1:10])) > colnames(df.2)[1]<-("Letters") > set.seed(1) > df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10) > df.2$numb1<-rnorm(10,1,1) > df.2$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") > df.2[8,3]<-12 > > df.1 > df.2 > > > > > Your patience is much appreciated, > Rob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >