Pat Schmitz
2009-Jul-30 08:19 UTC
[R] What is the best method to produce means by categorical factors?
I am attempting to replicate some of my experience from SAS in R and assume
there are best methods for using a combination of summary(), subset, and
which() to produce a subset of mean values by categorical or ordinal
factors.
within sas I would write
proc means mean data=dataset;
class factor1 factor2
var variable1 variable2;
RUN;
producing an output with means for each variable by factor groupings as
below:
*factor1 factor2 obs variable mean*
Level A treatmentA 3 variable1 10
variable2 22
treatmentB 3 variable1 12
variable2 30
Level B treatmentA 3 variable1 10
variable2 22
treatmentB 3 variable1 12
variable2 30
What is the best way to go about this in R?
--
Patrick Schmitz
Graduate Student
Plant Biology
1206 West Gregory Drive
RM 1500
[[alternative HTML version deleted]]
Petr PIKAL
2009-Jul-30 08:35 UTC
[R] Odp: What is the best method to produce means by categorical factors?
Hi r-help-bounces at r-project.org napsal dne 30.07.2009 10:19:21:> I am attempting to replicate some of my experience from SAS in R andassume> there are best methods for using a combination of summary(), subset, and > which() to produce a subset of mean values by categorical or ordinal > factors. > > within sas I would write > > proc means mean data=dataset; > class factor1 factor2 > var variable1 variable2; > RUN; > > producing an output with means for each variable by factor groupings as > below: > > *factor1 factor2 obs variable mean* > Level A treatmentA 3 variable1 10 > variable2 22 > > treatmentB 3 variable1 12 > variable2 30 > > Level B treatmentA 3 variable1 10 > variable2 22 > > treatmentB 3 variable1 12 > variable2 30 > > What is the best way to go about this in R?See ?aggregate, ?by, ?tapply and maybe also doBy and plyr packages. Something like aggregate(data, list(variable, factor2, factor1), mean) Best regards Petr> > > > > > > -- > Patrick Schmitz > Graduate Student > Plant Biology > 1206 West Gregory Drive > RM 1500 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
ONKELINX, Thierry
2009-Jul-30 08:39 UTC
[R] What is the best method to produce means by categorical factors?
Dear Pat,
Have a look at recast from the reshape package.
library(reshape)
dataset <- expand.grid(factor1 = c("A", "B"), factor2 =
c("C", "D"), Rep
= 1:3)
dataset$variable1 <- rnorm(nrow(dataset))
dataset$variable2 <- rnorm(nrow(dataset), mean = 10)
recast(factor1 + factor2 + variable ~ ., data = dataset, id.var
c("factor1", "factor2", "Rep"), fun = mean)
HTH,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Pat Schmitz
Verzonden: donderdag 30 juli 2009 10:19
Aan: r-help at r-project.org
Onderwerp: [R] What is the best method to produce means by categorical
factors?
I am attempting to replicate some of my experience from SAS in R and
assume there are best methods for using a combination of summary(),
subset, and
which() to produce a subset of mean values by categorical or ordinal
factors.
within sas I would write
proc means mean data=dataset;
class factor1 factor2
var variable1 variable2;
RUN;
producing an output with means for each variable by factor groupings as
below:
*factor1 factor2 obs variable mean*
Level A treatmentA 3 variable1 10
variable2 22
treatmentB 3 variable1 12
variable2 30
Level B treatmentA 3 variable1 10
variable2 22
treatmentB 3 variable1 12
variable2 30
What is the best way to go about this in R?
--
Patrick Schmitz
Graduate Student
Plant Biology
1206 West Gregory Drive
RM 1500
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.
John Kane
2009-Jul-30 23:09 UTC
[R] What is the best method to produce means by categorical factors?
The most common would be aggregate. You can change mean for other functions (
e.g. sum or length, median, or one you write yourself)
You probably would find Bob Munechen's book http://rforsasandspssusers.com/
very useful and there is a shorter pdf available on that page.
Example:
dd <- data.frame(factor1= rep(letters[1:10], 2), factor2=rep(LETTERS[1:5],4),
var1=rnorm(20, 10,2), var2=rnorm(20, 25,5))
aggregate(dd[,3:4], by=list(dd[,1],dd[,2]), mean)
aggregate(dd[,3:4], by=list(dd[,1],dd[,2]), sum)
--- On Thu, 7/30/09, Pat Schmitz <p.schmitz at gmail.com> wrote:
> From: Pat Schmitz <p.schmitz at gmail.com>
> Subject: [R] What is the best method to produce means by categorical
factors?
> To: r-help at r-project.org
> Received: Thursday, July 30, 2009, 4:19 AM
> I am attempting to replicate some of
> my experience from SAS in R and assume
> there are best methods for using a combination of
> summary(), subset, and
> which() to produce a subset of mean values by categorical
> or ordinal
> factors.
>
> within sas I would write
>
> proc means mean data=dataset;
> class factor1 factor2
> var variable1 variable2;
> RUN;
>
> producing an output with means for each variable by factor
> groupings as
> below:
>
> *factor1? ? ? ? factor2? ?
> ? ? ? obs? ?
> ???variable? ? mean*
> Level A? ? ? ? treatmentA? ?
> ? ? 3? ? ? ? variable1?
> ? 10
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? variable2? ?
> 22
>
> ? ? ? ? ? ?
> ???treatmentB? ? ? ?
> 3? ? ? ? variable1? ? 12
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? variable2? ?
> 30
>
> Level B? ? ? ? treatmentA? ?
> ? ? 3? ? ? ? variable1?
> ? 10
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? variable2? ?
> 22
>
> ? ? ? ? ? ?
> ???treatmentB? ? ? ?
> 3? ? ? ? variable1? ? 12
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? ? ? ?
> ? ? ? ? ? variable2? ?
> 30
>
> What is the best way to go about this in R?
>
>
>
>
>
>
> --
> Patrick Schmitz
> Graduate Student
> Plant Biology
> 1206 West Gregory Drive
> RM 1500
>
> ??? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
__________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your
favourite sites. Download it now
http://ca.toolbar.yahoo.com.