Dear R gurus, I regularly come across a situation where I would like to apply a function to a subset of data in a dataframe, but I have not found an R function to facilitate exactly what I need. More specifically, I'd like my function to have a context of where the data it's analyzing came from. Here is an example: ### BEGIN ### func<-function(x){ m<-median(x$x) if(m > 2 & m < x$y){ return(T) } return(F) } tmp<-data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a",3),rep("b",3),rep("c",4))) res<-aggregate(tmp,list(z),func) ### END ### The values in the example are trivial, but the problem is that only one column is passed to my function at a time, so I can't determine how 'm' relates to 'x$y'. Any tips/guidance is appreciated. Mark T. W. Ebbert
Hi: Try this: library(plyr) func <- function(x, y) { m <- median(x) if(m > 2 & m < mean(y)) ret <- TRUE else ret <- FALSE ret } ddply(tmp, .(z), summarise, r = func(x, y)) z r 1 a FALSE 2 b TRUE 3 c TRUE HTH, Dennis On Wed, Sep 15, 2010 at 2:45 PM, Mark Ebbert <Mark.Ebbert@hci.utah.edu>wrote:> Dear R gurus, > > I regularly come across a situation where I would like to apply a function > to a subset of data in a dataframe, but I have not found an R function to > facilitate exactly what I need. More specifically, I'd like my function to > have a context of where the data it's analyzing came from. Here is an > example: > > ### BEGIN ### > func<-function(x){ > m<-median(x$x) > if(m > 2 & m < x$y){ > return(T) > } > return(F) > } > > > tmp<-data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a",3),rep("b",3),rep("c",4))) > res<-aggregate(tmp,list(z),func) > ### END ### > > The values in the example are trivial, but the problem is that only one > column is passed to my function at a time, so I can't determine how 'm' > relates to 'x$y'. Any tips/guidance is appreciated. > > Mark T. W. Ebbert > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Sep 15, 2010, at 5:45 PM, Mark Ebbert wrote:> Dear R gurus, > > I regularly come across a situation where I would like to apply a > function to a subset of data in a dataframe, but I have not found an > R function to facilitate exactly what I need. More specifically, I'd > like my function to have a context of where the data it's analyzing > came from. Here is an example: >> ### BEGIN ### > func<-function(x){ > m<-median(x$x)> if(m > 2 & m < x$y){ > return(T) > } > return(F) > } >The semantic question is what are you trying to test when you say "m < x$y" ? "m" is a scalar and x is a vector. By default only the first element of x$y will be compared (not actually callable in that manner.)> tmp<- > data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a", > 3),rep("b",3),rep("c",4))) > res<-aggregate(tmp,list(z),func)I see Dennis has tried to move you forward to the plyr strategy, but some of us are mired in the traditonal ways: ?split # returns a dataframe in segments defined by a factor > func<-function(x){ + m<-median(x["x"], na.rm=TRUE) + if(m > 2 && m < x["y"]){ + return(T) + } + return(F) + } > > tmp<- data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a", 3),rep("b",3),rep("c",4))) > res<-lapply(split(tmp,list(tmp$z)), func) > res $a [1] FALSE $b [1] TRUE $c [1] TRUE> ### END ### > > The values in the example are trivial, but the problem is that only > one column is passed to my function at a time, so I can't determine > how 'm' relates to 'x$y'. Any tips/guidance is appreciated.-- David Winsemius, MD West Hartford, CT
I would approach this slightly differently. I would make func a function of x and y. func <- function(x,y){ m <- median(x) return(m > 2 & m < y) } Now generate tmp just as you have. then: require(plyr) res <- daply(tmp, .(z), summarise, res=func(x,y)) I believe this does the trick Abhijit On 9/15/10 5:45 PM, Mark Ebbert wrote:> Dear R gurus, > > I regularly come across a situation where I would like to apply a function to a subset of data in a dataframe, but I have not found an R function to facilitate exactly what I need. More specifically, I'd like my function to have a context of where the data it's analyzing came from. Here is an example: > > ### BEGIN ### > func<-function(x){ > m<-median(x$x) > if(m> 2& m< x$y){ > return(T) > } > return(F) > } > > tmp<-data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a",3),rep("b",3),rep("c",4))) > res<-aggregate(tmp,list(z),func) > ### END ### > > The values in the example are trivial, but the problem is that only one column is passed to my function at a time, so I can't determine how 'm' relates to 'x$y'. Any tips/guidance is appreciated. > > Mark T. W. Ebbert > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Abhijit Dasgupta, PhD Director and Principal Statistician ARAASTAT Ph: 301.385.3067 E: adasgupta at araastat.com W: http://www.araastat.com