Marius Hofert
2011-Aug-17 10:42 UTC
[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?
Dear all, First, let's create some data to play around: set.seed(1) (df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10), Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),]) ## Now we need the empirical distribution function: edf <- function(x) ecdf(x)(x) # empirical distribution function evaluated at x ## The big question is how one can apply the empirical distribution function to ## each subset of df determined by "Group", so how to apply it to Group1, then ## to Group2, and finally to Group3. You might suggest (?) to use tapply: (edf. <- tapply(df$Value, df$Group, FUN=edf)) ## That's correct. But typically, one would like to obtain not only the values, ## but a data.frame containing the original information and the new (edf-)values. ## What's a simple way to get this? (one would be required to first sort df ## according to Group, then paste the values computed by edf to the sorted df; ## seems a bit tedious). ## A solution I have is the following (but I would like to know if there is a ## simpler one): (edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){ subdata <- subset(df, Group==strg) # sub-data subdata <- cbind(subdata, edf=edf(subdata$Value)) })) ) Cheers, Marius
Nick Sabbe
2011-Aug-17 11:24 UTC
[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?
You might want to look at package plyr and use ddply. HTH, Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Marius Hofert > Sent: woensdag 17 augustus 2011 12:42 > To: Help R > Subject: [R] How to apply a function to subsets of a data frame *and* > obtain a data frame again? > > Dear all, > > First, let's create some data to play around: > > set.seed(1) > (df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10), > Value=c(rexp(10, 1), rexp(10, 4), rexp(10, > 10)))[sample(1:30,30),]) > > ## Now we need the empirical distribution function: > edf <- function(x) ecdf(x)(x) # empirical distribution function > evaluated at x > > ## The big question is how one can apply the empirical distribution > function to > ## each subset of df determined by "Group", so how to apply it to > Group1, then > ## to Group2, and finally to Group3. You might suggest (?) to use > tapply: > > (edf. <- tapply(df$Value, df$Group, FUN=edf)) > > ## That's correct. But typically, one would like to obtain not only the > values, > ## but a data.frame containing the original information and the new > (edf-)values. > ## What's a simple way to get this? (one would be required to first > sort df > ## according to Group, then paste the values computed by edf to the > sorted df; > ## seems a bit tedious). > ## A solution I have is the following (but I would like to know if > there is a > ## simpler one): > > (edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){ > subdata <- subset(df, Group==strg) # sub-data > subdata <- cbind(subdata, edf=edf(subdata$Value)) > })) ) > > > Cheers, > > Marius > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Marius Hofert
2011-Aug-17 11:51 UTC
[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?
Dear all, thanks a lot for the quick help. Below is what I built with the hint of Nick. Cheers, Marius library(plyr) set.seed(1) (df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10), Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),]) edf <- function(x) ecdf(x)(x) ddply(df, .(Group), function(df.) cbind(df., edf=edf(df.$Value))) On 2011-08-17, at 13:38 , Hadley Wickham wrote:>> The following example does what you want using ddply: >> >> library(plyr) >> edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value >> Value) > > Or slightly more succinctly: > > ddply(df, .(Group), mutate, edf = edf(Value)) > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/
Dimitris Rizopoulos
2011-Aug-17 12:06 UTC
[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?
Have a look at function ave(), e.g., set.seed(1) (df <- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10), Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),]) edf <- function(x) ecdf(x)(x) df$edf <- with(df, ave(Value, Group, FUN = edf)) df I hope it helps. Best, Dimitris On 8/17/2011 12:42 PM, Marius Hofert wrote:> Dear all, > > First, let's create some data to play around: > > set.seed(1) > (df<- data.frame(Group=rep(c("Group1","Group2","Group3"), each=10), > Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),]) > > ## Now we need the empirical distribution function: > edf<- function(x) ecdf(x)(x) # empirical distribution function evaluated at x > > ## The big question is how one can apply the empirical distribution function to > ## each subset of df determined by "Group", so how to apply it to Group1, then > ## to Group2, and finally to Group3. You might suggest (?) to use tapply: > > (edf.<- tapply(df$Value, df$Group, FUN=edf)) > > ## That's correct. But typically, one would like to obtain not only the values, > ## but a data.frame containing the original information and the new (edf-)values. > ## What's a simple way to get this? (one would be required to first sort df > ## according to Group, then paste the values computed by edf to the sorted df; > ## seems a bit tedious). > ## A solution I have is the following (but I would like to know if there is a > ## simpler one): > > (edf..<- do.call("rbind", lapply(unique(df$Group), function(strg){ > subdata<- subset(df, Group==strg) # sub-data > subdata<- cbind(subdata, edf=edf(subdata$Value)) > })) ) > > > Cheers, > > Marius > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/