thr3ads.net - R help - [R] aggregate, by, *apply [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Mark Ebbert

2010-Sep-15 21:45 UTC

[R] aggregate, by, *apply

Dear R gurus,

I regularly come across a situation where I would like to apply a function to a
subset of data in a dataframe, but I have not found an R function to facilitate
exactly what I need. More specifically, I'd like my function to have a
context of where the data it's analyzing came from. Here is an example:

### BEGIN ###
func<-function(x){
	m<-median(x$x)
	if(m > 2 & m < x$y){
		return(T)
	}
	return(F)
}

tmp<-data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a",3),rep("b",3),rep("c",4)))
res<-aggregate(tmp,list(z),func)
### END ###

The values in the example are trivial, but the problem is that only one column
is passed to my function at a time, so I can't determine how 'm'
relates to 'x$y'. Any tips/guidance is appreciated.

Mark T. W. Ebbert

Dennis Murphy

2010-Sep-15 23:13 UTC

head link

[R] aggregate, by, *apply

Hi:

Try this:

library(plyr)
func <- function(x, y) {
     m <- median(x)
     if(m > 2 & m < mean(y)) ret <- TRUE else ret <- FALSE
     ret
  }
ddply(tmp, .(z), summarise, r = func(x, y))
  z     r
1 a FALSE
2 b  TRUE
3 c  TRUE

HTH,
Dennis

On Wed, Sep 15, 2010 at 2:45 PM, Mark Ebbert
<Mark.Ebbert@hci.utah.edu>wrote:
> Dear R gurus,
>
> I regularly come across a situation where I would like to apply a function
> to a subset of data in a dataframe, but I have not found an R function to
> facilitate exactly what I need. More specifically, I'd like my function
to
> have a context of where the data it's analyzing came from. Here is an
> example:
>
> ### BEGIN ###
> func<-function(x){
>        m<-median(x$x)
>        if(m > 2 & m < x$y){
>                return(T)
>        }
>        return(F)
> }
>
>
>
tmp<-data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a",3),rep("b",3),rep("c",4)))
> res<-aggregate(tmp,list(z),func)
> ### END ###
>
> The values in the example are trivial, but the problem is that only one
> column is passed to my function at a time, so I can't determine how
'm'
> relates to 'x$y'. Any tips/guidance is appreciated.
>
> Mark T. W. Ebbert
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

David Winsemius

2010-Sep-15 23:20 UTC

head link

[R] aggregate, by, *apply

On Sep 15, 2010, at 5:45 PM, Mark Ebbert wrote:
> Dear R gurus,
>
> I regularly come across a situation where I would like to apply a  
> function to a subset of data in a dataframe, but I have not found an  
> R function to facilitate exactly what I need. More specifically, I'd  
> like my function to have a context of where the data it's analyzing  
> came from. Here is an example:
>

> ### BEGIN ###
> func<-function(x){
> 	m<-median(x$x)
> 	if(m > 2 & m < x$y){
> 		return(T)
> 	}
> 	return(F)
> }
>
The semantic question is what are you trying to test when you say "m <  
x$y" ? "m" is a scalar and x is a vector. By default only the
first
element of x$y  will be compared (not actually callable in that manner.)
> tmp<- 
> data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a",
> 3),rep("b",3),rep("c",4)))
> res<-aggregate(tmp,list(z),func)
I see Dennis has tried to move you forward to the plyr strategy, but  
some of us are mired in the traditonal ways:

?split  # returns a dataframe in segments defined by a factor

 > func<-function(x){
+ 	m<-median(x["x"], na.rm=TRUE)
+ 	if(m > 2 && m < x["y"]){
+ 		return(T)
+ 	}
+ 	return(F)
+ }
 >
 > tmp<- 
data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a", 
3),rep("b",3),rep("c",4)))
 > res<-lapply(split(tmp,list(tmp$z)), func)
 > res
$a
[1] FALSE

$b
[1] TRUE

$c
[1] TRUE> ### END ###
>
> The values in the example are trivial, but the problem is that only  
> one column is passed to my function at a time, so I can't determine  
> how 'm' relates to 'x$y'. Any tips/guidance is appreciated.-- 

David Winsemius, MD
West Hartford, CT

Abhijit Dasgupta, PhD

2010-Sep-15 23:22 UTC

head link

[R] aggregate, by, *apply

I would approach this slightly differently. I would make func a 
function of x and y.

func <- function(x,y){
     m <- median(x)
     return(m > 2 & m < y)
}

Now generate tmp just as you have. then:

require(plyr)
res <- daply(tmp, .(z), summarise, res=func(x,y))

I believe this does the trick

Abhijit
On 9/15/10 5:45 PM, Mark Ebbert wrote:> Dear R gurus,
>
> I regularly come across a situation where I would like to apply a function
to a subset of data in a dataframe, but I have not found an R function to
facilitate exactly what I need. More specifically, I'd like my function to
have a context of where the data it's analyzing came from. Here is an
example:
>
> ### BEGIN ###
> func<-function(x){
> 	m<-median(x$x)
> 	if(m>  2&  m<  x$y){
> 		return(T)
> 	}
> 	return(F)
> }
>
>
tmp<-data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep("a",3),rep("b",3),rep("c",4)))
> res<-aggregate(tmp,list(z),func)
> ### END ###
>
> The values in the example are trivial, but the problem is that only one
column is passed to my function at a time, so I can't determine how
'm' relates to 'x$y'. Any tips/guidance is appreciated.
>
> Mark T. W. Ebbert
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgupta at araastat.com
W: http://www.araastat.com

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Sep 2010 - aggregate, by, *apply

[R] aggregate, by, *apply

[R] aggregate, by, *apply

[R] aggregate, by, *apply

[R] aggregate, by, *apply

Apparently Analagous Threads