thr3ads.net - R help - [R] How to apply a function to subsets of a data frame *and* obtain a data frame again? [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Marius Hofert

2011-Aug-17 10:42 UTC

[R] How to apply a function to subsets of a data frame and obtain a data frame again?

Dear all,

First, let's create some data to play around:

set.seed(1)
(df <-
data.frame(Group=rep(c("Group1","Group2","Group3"),
each=10),
                 Value=c(rexp(10, 1), rexp(10, 4), rexp(10,
10)))[sample(1:30,30),])

## Now we need the empirical distribution function:
edf <- function(x) ecdf(x)(x) # empirical distribution function evaluated at
x

## The big question is how one can apply the empirical distribution function to 
## each subset of df determined by "Group", so how to apply it to
Group1, then
## to Group2, and finally to Group3. You might suggest (?) to use tapply:

(edf. <- tapply(df$Value, df$Group, FUN=edf))

## That's correct. But typically, one would like to obtain not only the
values,
## but a data.frame containing the original information and the new
(edf-)values.
## What's a simple way to get this? (one would be required to first sort df 
## according to Group, then paste the values computed by edf to the sorted df; 
## seems a bit tedious). 
## A solution I have is the following (but I would like to know if there is a 
## simpler one):

(edf.. <- do.call("rbind", lapply(unique(df$Group), function(strg){
    subdata <- subset(df, Group==strg) # sub-data
    subdata <- cbind(subdata, edf=edf(subdata$Value))
})) )


Cheers,

Marius

Nick Sabbe

2011-Aug-17 11:24 UTC

head link

[R] How to apply a function to subsets of a data frame and obtain a data frame again?

You might want to look at package plyr and use ddply.

HTH,


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Marius Hofert
> Sent: woensdag 17 augustus 2011 12:42
> To: Help R
> Subject: [R] How to apply a function to subsets of a data frame *and*
> obtain a data frame again?
> 
> Dear all,
> 
> First, let's create some data to play around:
> 
> set.seed(1)
> (df <-
data.frame(Group=rep(c("Group1","Group2","Group3"),
each=10),
>                  Value=c(rexp(10, 1), rexp(10, 4), rexp(10,
> 10)))[sample(1:30,30),])
> 
> ## Now we need the empirical distribution function:
> edf <- function(x) ecdf(x)(x) # empirical distribution function
> evaluated at x
> 
> ## The big question is how one can apply the empirical distribution
> function to
> ## each subset of df determined by "Group", so how to apply it to
> Group1, then
> ## to Group2, and finally to Group3. You might suggest (?) to use
> tapply:
> 
> (edf. <- tapply(df$Value, df$Group, FUN=edf))
> 
> ## That's correct. But typically, one would like to obtain not only the
> values,
> ## but a data.frame containing the original information and the new
> (edf-)values.
> ## What's a simple way to get this? (one would be required to first
> sort df
> ## according to Group, then paste the values computed by edf to the
> sorted df;
> ## seems a bit tedious).
> ## A solution I have is the following (but I would like to know if
> there is a
> ## simpler one):
> 
> (edf.. <- do.call("rbind", lapply(unique(df$Group),
function(strg){
>     subdata <- subset(df, Group==strg) # sub-data
>     subdata <- cbind(subdata, edf=edf(subdata$Value))
> })) )
> 
> 
> Cheers,
> 
> Marius
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marius Hofert

2011-Aug-17 11:51 UTC

head link

[R] How to apply a function to subsets of a data frame and obtain a data frame again?

Dear all, 

thanks a lot for the quick help. 
Below is what I built with the hint of Nick.

Cheers,

Marius


library(plyr)

set.seed(1)
(df <-
data.frame(Group=rep(c("Group1","Group2","Group3"),
each=10),
                Value=c(rexp(10, 1), rexp(10, 4), rexp(10,
10)))[sample(1:30,30),])
edf <- function(x) ecdf(x)(x) 

ddply(df, .(Group), function(df.) cbind(df., edf=edf(df.$Value))) 


On 2011-08-17, at 13:38 , Hadley Wickham wrote:
>> The following example does what you want using ddply:
>> 
>> library(plyr)
>> edfPerGroup = ddply(df, .(Group), summarise, edf = edf(Value), Value
>> Value)
> 
> Or slightly more succinctly:
> 
> ddply(df, .(Group), mutate, edf = edf(Value))
> 
> Hadley
> 
> -- 
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/

Dimitris Rizopoulos

2011-Aug-17 12:06 UTC

head link

[R] How to apply a function to subsets of a data frame and obtain a data frame again?

Have a look at function ave(), e.g.,

set.seed(1)
(df <-
data.frame(Group=rep(c("Group1","Group2","Group3"),
each=10),
     Value=c(rexp(10, 1), rexp(10, 4), rexp(10, 10)))[sample(1:30,30),])

edf <- function(x) ecdf(x)(x)
df$edf <- with(df, ave(Value, Group, FUN = edf))
df


I hope it helps.

Best,
Dimitris


On 8/17/2011 12:42 PM, Marius Hofert wrote:> Dear all,
>
> First, let's create some data to play around:
>
> set.seed(1)
> (df<-
data.frame(Group=rep(c("Group1","Group2","Group3"),
each=10),
>                   Value=c(rexp(10, 1), rexp(10, 4), rexp(10,
10)))[sample(1:30,30),])
>
> ## Now we need the empirical distribution function:
> edf<- function(x) ecdf(x)(x) # empirical distribution function evaluated
at x
>
> ## The big question is how one can apply the empirical distribution
function to
> ## each subset of df determined by "Group", so how to apply it to
Group1, then
> ## to Group2, and finally to Group3. You might suggest (?) to use tapply:
>
> (edf.<- tapply(df$Value, df$Group, FUN=edf))
>
> ## That's correct. But typically, one would like to obtain not only the
values,
> ## but a data.frame containing the original information and the new
(edf-)values.
> ## What's a simple way to get this? (one would be required to first
sort df
> ## according to Group, then paste the values computed by edf to the sorted
df;
> ## seems a bit tedious).
> ## A solution I have is the following (but I would like to know if there is
a
> ## simpler one):
>
> (edf..<- do.call("rbind", lapply(unique(df$Group),
function(strg){
>      subdata<- subset(df, Group==strg) # sub-data
>      subdata<- cbind(subdata, edf=edf(subdata$Value))
> })) )
>
>
> Cheers,
>
> Marius
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Aug 2011 - How to apply a function to subsets of a data frame *and* obtain a data frame again?

[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

[R] How to apply a function to subsets of a data frame *and* obtain a data frame again?

Seemingly Similar Threads

R help - Aug 2011 - How to apply a function to subsets of a data frame and obtain a data frame again?

[R] How to apply a function to subsets of a data frame and obtain a data frame again?

[R] How to apply a function to subsets of a data frame and obtain a data frame again?

[R] How to apply a function to subsets of a data frame and obtain a data frame again?

[R] How to apply a function to subsets of a data frame and obtain a data frame again?