thr3ads.net - R help - [R] mean for subset [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Geoffrey Smith

2010-Jan-05 18:29 UTC

[R] mean for subset

Hello, does anyone know how to take the mean for a subset of observations?
For example, suppose my data looks like this:

OBS     NAME   SCORE
1          Tom       92
2          Tom       88
3          Tom       56
4          James    85
5          James    75
6          James    32
7          Dawn     56
8          Dawn     91
9          Clara     95
10        Clara     84

Is there a way to get the mean of the SCORE variable by NAME but only when
the number of observations is equal to 3?  In other words, is there a way to
get the mean of the SCORE variable for Tom and James, but not for Dawn and
Clara?  Thank you.

-- 
Geoffrey Smith
Visiting Assistant Professor
Department of Finance
W. P. Carey School of Business
Arizona State University

	[[alternative HTML version deleted]]

Henrique Dallazuanna

2010-Jan-05 18:52 UTC

head link

[R] mean for subset

Try this:

with(split(DF, with(DF, ave(SCORE, NAME, FUN = length)))[['3']],
tapply(SCORE, NAME[,drop = TRUE], FUN = mean))

Or:

 with(DF, tapply(SCORE, NAME, mean))[table(DF$NAME) == 3]

On Tue, Jan 5, 2010 at 4:29 PM, Geoffrey Smith <gps at asu.edu>
wrote:> Hello, does anyone know how to take the mean for a subset of observations?
> For example, suppose my data looks like this:
>
> OBS ? ? NAME ? SCORE
> 1 ? ? ? ? ?Tom ? ? ? 92
> 2 ? ? ? ? ?Tom ? ? ? 88
> 3 ? ? ? ? ?Tom ? ? ? 56
> 4 ? ? ? ? ?James ? ?85
> 5 ? ? ? ? ?James ? ?75
> 6 ? ? ? ? ?James ? ?32
> 7 ? ? ? ? ?Dawn ? ? 56
> 8 ? ? ? ? ?Dawn ? ? 91
> 9 ? ? ? ? ?Clara ? ? 95
> 10 ? ? ? ?Clara ? ? 84
>
> Is there a way to get the mean of the SCORE variable by NAME but only when
> the number of observations is equal to 3? ?In other words, is there a way
to
> get the mean of the SCORE variable for Tom and James, but not for Dawn and
> Clara? ?Thank you.
>
> --
> Geoffrey Smith
> Visiting Assistant Professor
> Department of Finance
> W. P. Carey School of Business
> Arizona State University
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

Gabor Grothendieck

2010-Jan-05 19:03 UTC

head link

[R] mean for subset

Have a look at this post and the rest of that thread:

https://stat.ethz.ch/pipermail/r-help/2010-January/223420.html

On Tue, Jan 5, 2010 at 1:29 PM, Geoffrey Smith <gps at asu.edu>
wrote:> Hello, does anyone know how to take the mean for a subset of observations?
> For example, suppose my data looks like this:
>
> OBS ? ? NAME ? SCORE
> 1 ? ? ? ? ?Tom ? ? ? 92
> 2 ? ? ? ? ?Tom ? ? ? 88
> 3 ? ? ? ? ?Tom ? ? ? 56
> 4 ? ? ? ? ?James ? ?85
> 5 ? ? ? ? ?James ? ?75
> 6 ? ? ? ? ?James ? ?32
> 7 ? ? ? ? ?Dawn ? ? 56
> 8 ? ? ? ? ?Dawn ? ? 91
> 9 ? ? ? ? ?Clara ? ? 95
> 10 ? ? ? ?Clara ? ? 84
>
> Is there a way to get the mean of the SCORE variable by NAME but only when
> the number of observations is equal to 3? ?In other words, is there a way
to
> get the mean of the SCORE variable for Tom and James, but not for Dawn and
> Clara? ?Thank you.
>
> --
> Geoffrey Smith
> Visiting Assistant Professor
> Department of Finance
> W. P. Carey School of Business
> Arizona State University
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Achim Zeileis

2010-Jan-05 19:09 UTC

head link

[R] mean for subset

On Tue, 5 Jan 2010, Geoffrey Smith wrote:
> Hello, does anyone know how to take the mean for a subset of observations?
> For example, suppose my data looks like this:
>
> OBS     NAME   SCORE
> 1          Tom       92
> 2          Tom       88
> 3          Tom       56
> 4          James    85
> 5          James    75
> 6          James    32
> 7          Dawn     56
> 8          Dawn     91
> 9          Clara     95
> 10        Clara     84
>
> Is there a way to get the mean of the SCORE variable by NAME but only when
> the number of observations is equal to 3?  In other words, is there a way
to
> get the mean of the SCORE variable for Tom and James, but not for Dawn and
> Clara?  Thank you.
You can use tapply() together with a custom function that returns NA if 
the condition is not satisfied, e.g.

## read data
dat <- read.table(textConnection("
OBS     NAME   SCORE
1          Tom       92
2          Tom       88
3          Tom       56
4          James    85
5          James    75
6          James    32
7          Dawn     56
8          Dawn     91
9          Clara     95
10        Clara     84
"), header = TRUE)

## use tapply() with custom function
with(dat,
   tapply(SCORE, NAME, function(x) if(length(x) == 3) mean(x) else NA)
)

Alternatively you could look at

mymean <-   with(dat, tapply(SCORE, NAME, mean))
mylength <- with(dat, tapply(SCORE, NAME, length))
mymean[mylength == 3]

etc.

hth,
Z
> -- 
> Geoffrey Smith
> Visiting Assistant Professor
> Department of Finance
> W. P. Carey School of Business
> Arizona State University
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Duncan Murdoch

2010-Jan-05 19:11 UTC

head link

[R] mean for subset

On 05/01/2010 1:29 PM, Geoffrey Smith wrote:> Hello, does anyone know how to take the mean for a subset of observations?
> For example, suppose my data looks like this:
>
> OBS     NAME   SCORE
> 1          Tom       92
> 2          Tom       88
> 3          Tom       56
> 4          James    85
> 5          James    75
> 6          James    32
> 7          Dawn     56
> 8          Dawn     91
> 9          Clara     95
> 10        Clara     84
>
> Is there a way to get the mean of the SCORE variable by NAME but only when
> the number of observations is equal to 3?  In other words, is there a way
to
> get the mean of the SCORE variable for Tom and James, but not for Dawn and
> Clara?  Thank you.
>   
You probably want to do it in two steps:  first, find which names have 3 
observations, and take that subset of the dataset; then do the mean on 
all groups.  This is one way:

 > counts <- table(dataset$NAME)
 > keep <- names(counts)[counts == 3]
 > subset <- dataset[ dataset$NAME %in% keep,]
 > tapply(subset$SCORE, subset$NAME, mean)
   Clara     Dawn    James      Tom
      NA       NA 64.00000 78.66667

Duncan Murdoch

Gabor Grothendieck

2010-Jan-05 19:22 UTC

head link

[R] mean for subset

Here is the solution using sqldf which can do it in one statement:
> # read in data
> Lines <- "OBS     NAME   SCORE+ 1          Tom       92
+ 2          Tom       88
+ 3          Tom       56
+ 4          James    85
+ 5          James    75
+ 6          James    32
+ 7          Dawn     56
+ 8          Dawn     91
+ 9          Clara     95
+ 10        Clara     84">
> DF <- read.table(textConnection(Lines), header = TRUE)
>
> # run
> library(sqldf)
> sqldf("select NAME, avg(SCORE) from DF group by NAME having count(*) =
3")   NAME avg(SCORE)
1 James   64.00000
2   Tom   78.66667


On Tue, Jan 5, 2010 at 2:03 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:> Have a look at this post and the rest of that thread:
>
> https://stat.ethz.ch/pipermail/r-help/2010-January/223420.html
>
> On Tue, Jan 5, 2010 at 1:29 PM, Geoffrey Smith <gps at asu.edu>
wrote:
>> Hello, does anyone know how to take the mean for a subset of
observations?
>> For example, suppose my data looks like this:
>>
>> OBS ? ? NAME ? SCORE
>> 1 ? ? ? ? ?Tom ? ? ? 92
>> 2 ? ? ? ? ?Tom ? ? ? 88
>> 3 ? ? ? ? ?Tom ? ? ? 56
>> 4 ? ? ? ? ?James ? ?85
>> 5 ? ? ? ? ?James ? ?75
>> 6 ? ? ? ? ?James ? ?32
>> 7 ? ? ? ? ?Dawn ? ? 56
>> 8 ? ? ? ? ?Dawn ? ? 91
>> 9 ? ? ? ? ?Clara ? ? 95
>> 10 ? ? ? ?Clara ? ? 84
>>
>> Is there a way to get the mean of the SCORE variable by NAME but only
when
>> the number of observations is equal to 3? ?In other words, is there a
way to
>> get the mean of the SCORE variable for Tom and James, but not for Dawn
and
>> Clara? ?Thank you.
>>
>> --
>> Geoffrey Smith
>> Visiting Assistant Professor
>> Department of Finance
>> W. P. Carey School of Business
>> Arizona State University
>>
>> ? ? ? ?[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

Maybe Matching Threads

Search for more reasonably related threads

R help - Jan 2010 - mean for subset

[R] mean for subset

[R] mean for subset

[R] mean for subset

[R] mean for subset

[R] mean for subset

[R] mean for subset

Maybe Matching Threads