John Sorkin
2016-Apr-07 22:39 UTC
[R] Using a function with apply Error: undefined columns selected
I am trying to write a function that can be used to apply to process all the columns of a data.frame. If you will run the code below, you will get the error message undefined columns selected. I hope someone will be able to teach me what I am doing wrong. Thank you, John # create data frame. guppy fract2 <- function(col,data) { cat("Prove we have passed the data frame\n") print(data) # Get the name of the column being processed. zz<-deparse(substitute(col)) cat("Column being processed\n") print(zz) p<-sum(data[,zz]!="")/length(data[,zz]) return(p) } apply(guppy,2,fract2,data=guppy) John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Jim Lemon
2016-Apr-07 23:44 UTC
[R] Using a function with apply Error: undefined columns selected
Hi John, First, apply isn't guaranteed to work on data frames. There are two easy ways to do something like this, but we had better have a data frame: guppy<-data.frame(taste=rnorm(10,5), crunch=rnorm(10,5),satiety=rnorm(10,5)) If you just want to apply a function to all or a subset of columns of a data frame, a for loop can be used: fract2.1<-function(col,data) { p<-sum(data[,col],na.rm=TRUE)/sum(!is.na(data[,col])) return(p) } for(col in 1:ncol(guppy)) print(fract2.1(col,guppy)) If you really do want to use an "*apply" function, then the function has to be written for each column, not the entire data frame: fract2.2<-function(x) return(sum(x,na.rm=TRUE)/sum(!is.na(x))) sapply(guppy,fract2.2) and if you want a subset of the columns, you will have to do it before you let sapply get into it. Jim On Fri, Apr 8, 2016 at 8:39 AM, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:> I am trying to write a function that can be used to apply to process all the columns of a data.frame. If you will run the code below, you will get the error message undefined columns selected. I hope someone will be able to teach me what I am doing wrong. > Thank you, > John > > # create data frame. > guppy > > fract2 <- function(col,data) { > cat("Prove we have passed the data frame\n") > print(data) > > # Get the name of the column being processed. > zz<-deparse(substitute(col)) > cat("Column being processed\n") > print(zz) > p<-sum(data[,zz]!="")/length(data[,zz]) > return(p) > } > > apply(guppy,2,fract2,data=guppy) > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > Confidentiality Statement: > This email message, including any attachments, is for ...{{dropped:12}}
David Winsemius
2016-Apr-08 02:45 UTC
[R] Using a function with apply Error: undefined columns selected
> On Apr 7, 2016, at 3:39 PM, John Sorkin <jsorkin at grecc.umaryland.edu> wrote: > > I am trying to write a function that can be used to apply to process all the columns of a data.frame. If you will run the code below, you will get the error message undefined columns selected. I hope someone will be able to teach me what I am doing wrong. > Thank you, > John > > # create data frame. > guppy > > fract2 <- function(col,data) { > cat("Prove we have passed the data frame\n") > print(data) > > # Get the name of the column being processed. > zz<-deparse(substitute(col)) > cat("Column being processed\n") > print(zz) > p<-sum(data[,zz]!="")/length(data[,zz]) > return(p) > } > > apply(guppy,2,fract2,data=guppy)At the point where the error is about to occur during the first column being processed, this is what had been printed: Column being processed [1] "newX[, i]" So it should be no surprise that the actual error was: Error in `[.data.frame`(data, , zz) : undefined columns selected It occurred at one of the two points where you tried `data[,zz]` You need to pass colnames(guppy) to fract2 and work with the character values. All of the *apply function pass only values so they do not pass the column names for testing. A possible exception might be the colnames of a vector being avaialble when using apply with an index of 1. --> John David Sorkin M.D., Ph.D. >Snipped -- David Winsemius Alameda, CA, USA