Hello, does anyone know how to take the mean for a subset of observations? For example, suppose my data looks like this: OBS NAME SCORE 1 Tom 92 2 Tom 88 3 Tom 56 4 James 85 5 James 75 6 James 32 7 Dawn 56 8 Dawn 91 9 Clara 95 10 Clara 84 Is there a way to get the mean of the SCORE variable by NAME but only when the number of observations is equal to 3? In other words, is there a way to get the mean of the SCORE variable for Tom and James, but not for Dawn and Clara? Thank you. -- Geoffrey Smith Visiting Assistant Professor Department of Finance W. P. Carey School of Business Arizona State University [[alternative HTML version deleted]]
Try this: with(split(DF, with(DF, ave(SCORE, NAME, FUN = length)))[['3']], tapply(SCORE, NAME[,drop = TRUE], FUN = mean)) Or: with(DF, tapply(SCORE, NAME, mean))[table(DF$NAME) == 3] On Tue, Jan 5, 2010 at 4:29 PM, Geoffrey Smith <gps at asu.edu> wrote:> Hello, does anyone know how to take the mean for a subset of observations? > For example, suppose my data looks like this: > > OBS ? ? NAME ? SCORE > 1 ? ? ? ? ?Tom ? ? ? 92 > 2 ? ? ? ? ?Tom ? ? ? 88 > 3 ? ? ? ? ?Tom ? ? ? 56 > 4 ? ? ? ? ?James ? ?85 > 5 ? ? ? ? ?James ? ?75 > 6 ? ? ? ? ?James ? ?32 > 7 ? ? ? ? ?Dawn ? ? 56 > 8 ? ? ? ? ?Dawn ? ? 91 > 9 ? ? ? ? ?Clara ? ? 95 > 10 ? ? ? ?Clara ? ? 84 > > Is there a way to get the mean of the SCORE variable by NAME but only when > the number of observations is equal to 3? ?In other words, is there a way to > get the mean of the SCORE variable for Tom and James, but not for Dawn and > Clara? ?Thank you. > > -- > Geoffrey Smith > Visiting Assistant Professor > Department of Finance > W. P. Carey School of Business > Arizona State University > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Have a look at this post and the rest of that thread: https://stat.ethz.ch/pipermail/r-help/2010-January/223420.html On Tue, Jan 5, 2010 at 1:29 PM, Geoffrey Smith <gps at asu.edu> wrote:> Hello, does anyone know how to take the mean for a subset of observations? > For example, suppose my data looks like this: > > OBS ? ? NAME ? SCORE > 1 ? ? ? ? ?Tom ? ? ? 92 > 2 ? ? ? ? ?Tom ? ? ? 88 > 3 ? ? ? ? ?Tom ? ? ? 56 > 4 ? ? ? ? ?James ? ?85 > 5 ? ? ? ? ?James ? ?75 > 6 ? ? ? ? ?James ? ?32 > 7 ? ? ? ? ?Dawn ? ? 56 > 8 ? ? ? ? ?Dawn ? ? 91 > 9 ? ? ? ? ?Clara ? ? 95 > 10 ? ? ? ?Clara ? ? 84 > > Is there a way to get the mean of the SCORE variable by NAME but only when > the number of observations is equal to 3? ?In other words, is there a way to > get the mean of the SCORE variable for Tom and James, but not for Dawn and > Clara? ?Thank you. > > -- > Geoffrey Smith > Visiting Assistant Professor > Department of Finance > W. P. Carey School of Business > Arizona State University > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Tue, 5 Jan 2010, Geoffrey Smith wrote:> Hello, does anyone know how to take the mean for a subset of observations? > For example, suppose my data looks like this: > > OBS NAME SCORE > 1 Tom 92 > 2 Tom 88 > 3 Tom 56 > 4 James 85 > 5 James 75 > 6 James 32 > 7 Dawn 56 > 8 Dawn 91 > 9 Clara 95 > 10 Clara 84 > > Is there a way to get the mean of the SCORE variable by NAME but only when > the number of observations is equal to 3? In other words, is there a way to > get the mean of the SCORE variable for Tom and James, but not for Dawn and > Clara? Thank you.You can use tapply() together with a custom function that returns NA if the condition is not satisfied, e.g. ## read data dat <- read.table(textConnection(" OBS NAME SCORE 1 Tom 92 2 Tom 88 3 Tom 56 4 James 85 5 James 75 6 James 32 7 Dawn 56 8 Dawn 91 9 Clara 95 10 Clara 84 "), header = TRUE) ## use tapply() with custom function with(dat, tapply(SCORE, NAME, function(x) if(length(x) == 3) mean(x) else NA) ) Alternatively you could look at mymean <- with(dat, tapply(SCORE, NAME, mean)) mylength <- with(dat, tapply(SCORE, NAME, length)) mymean[mylength == 3] etc. hth, Z> -- > Geoffrey Smith > Visiting Assistant Professor > Department of Finance > W. P. Carey School of Business > Arizona State University > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
On 05/01/2010 1:29 PM, Geoffrey Smith wrote:> Hello, does anyone know how to take the mean for a subset of observations? > For example, suppose my data looks like this: > > OBS NAME SCORE > 1 Tom 92 > 2 Tom 88 > 3 Tom 56 > 4 James 85 > 5 James 75 > 6 James 32 > 7 Dawn 56 > 8 Dawn 91 > 9 Clara 95 > 10 Clara 84 > > Is there a way to get the mean of the SCORE variable by NAME but only when > the number of observations is equal to 3? In other words, is there a way to > get the mean of the SCORE variable for Tom and James, but not for Dawn and > Clara? Thank you. >You probably want to do it in two steps: first, find which names have 3 observations, and take that subset of the dataset; then do the mean on all groups. This is one way: > counts <- table(dataset$NAME) > keep <- names(counts)[counts == 3] > subset <- dataset[ dataset$NAME %in% keep,] > tapply(subset$SCORE, subset$NAME, mean) Clara Dawn James Tom NA NA 64.00000 78.66667 Duncan Murdoch
Here is the solution using sqldf which can do it in one statement:> # read in data > Lines <- "OBS NAME SCORE+ 1 Tom 92 + 2 Tom 88 + 3 Tom 56 + 4 James 85 + 5 James 75 + 6 James 32 + 7 Dawn 56 + 8 Dawn 91 + 9 Clara 95 + 10 Clara 84"> > DF <- read.table(textConnection(Lines), header = TRUE) > > # run > library(sqldf) > sqldf("select NAME, avg(SCORE) from DF group by NAME having count(*) = 3")NAME avg(SCORE) 1 James 64.00000 2 Tom 78.66667 On Tue, Jan 5, 2010 at 2:03 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Have a look at this post and the rest of that thread: > > https://stat.ethz.ch/pipermail/r-help/2010-January/223420.html > > On Tue, Jan 5, 2010 at 1:29 PM, Geoffrey Smith <gps at asu.edu> wrote: >> Hello, does anyone know how to take the mean for a subset of observations? >> For example, suppose my data looks like this: >> >> OBS ? ? NAME ? SCORE >> 1 ? ? ? ? ?Tom ? ? ? 92 >> 2 ? ? ? ? ?Tom ? ? ? 88 >> 3 ? ? ? ? ?Tom ? ? ? 56 >> 4 ? ? ? ? ?James ? ?85 >> 5 ? ? ? ? ?James ? ?75 >> 6 ? ? ? ? ?James ? ?32 >> 7 ? ? ? ? ?Dawn ? ? 56 >> 8 ? ? ? ? ?Dawn ? ? 91 >> 9 ? ? ? ? ?Clara ? ? 95 >> 10 ? ? ? ?Clara ? ? 84 >> >> Is there a way to get the mean of the SCORE variable by NAME but only when >> the number of observations is equal to 3? ?In other words, is there a way to >> get the mean of the SCORE variable for Tom and James, but not for Dawn and >> Clara? ?Thank you. >> >> -- >> Geoffrey Smith >> Visiting Assistant Professor >> Department of Finance >> W. P. Carey School of Business >> Arizona State University >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >