Aurélien PHILIPPOT
2011-Dec-04 20:32 UTC
[R] Group several variables and apply a function to the group
Dear R-experts,
I am struggling with the following problem, and I am looking for advice
from more experienced R-users: I have a data frame with 2 identifying
variables (comn and mi), and an output variable (x). comn is a variable for
a company and mi is a variable for a month.
comn<-c("abc", "abc", "abc", "abc",
"abc", "abc", "xyz",
"xyz","xyz", "xyz")
mi<- c("1", "1","1", "2",
"2", "2", "1", "1", "3",
"3")
x<- c("-0.0031", "0.0009", "-0.007",
"0.1929","0.0087", "0.099","-0.089",
"0.005", "-0.0078", "0.67" )
df<- data.frame(comn=comn, mi=mi, x=x)
For each company, within a particular month, I would like to compute the
standard deviation of x: for example, for abc, I would like to compute the
sd of x for month1 (when mi=1) and for month2 (when mi=2).
In other languages (Stata for instance), I would create a grouping variable
(group comnn and mi) and then, apply the sd function for each group.
However, I don't find an elegant way to do the same in R:
I was thinking about the following: I could subset my data frame by mi and
create one file per month, and then make a loop and in each file, use a
"by" operator for each comn. I am sure it would work, but I feel that
it
would be like killing an ant with a tank.
I was wondering if anyone knew a more straightforward way to implement that
kind of operation?
Thanks a lot,
Best,
Aurelien
[[alternative HTML version deleted]]
Felipe Carrillo
2011-Dec-04 20:51 UTC
[R] Group several variables and apply a function to the group
Like this? library(plyr) ddply(df,.(comn,mi),summarise,stDEV=sd(x)) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish & Wildlife Service California, USA http://www.fws.gov/redbluff/rbdd_jsmp.aspx From: Aurélien PHILIPPOT <aurelien.philippot@gmail.com>>To: R-help@r-project.org >Sent: Sunday, December 4, 2011 12:32 PM >Subject: [R] Group several variables and apply a function to the group > >Dear R-experts, >I am struggling with the following problem, and I am looking for advice >from more experienced R-users: I have a data frame with 2 identifying >variables (comn and mi), and an output variable (x). comn is a variable for >a company and mi is a variable for a month. > >comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz") >mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") >x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089", >"0.005", "-0.0078", "0.67" ) >df<- data.frame(comn=comn, mi=mi, x=x) > > >For each company, within a particular month, I would like to compute the >standard deviation of x: for example, for abc, I would like to compute the >sd of x for month1 (when mi=1) and for month2 (when mi=2). > >In other languages (Stata for instance), I would create a grouping variable >(group comnn and mi) and then, apply the sd function for each group. > >However, I don't find an elegant way to do the same in R: > >I was thinking about the following: I could subset my data frame by mi and >create one file per month, and then make a loop and in each file, use a >"by" operator for each comn. I am sure it would work, but I feel that it >would be like killing an ant with a tank. > >I was wondering if anyone knew a more straightforward way to implement that >kind of operation? > >Thanks a lot, > >Best, >Aurelien > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
Pete Brecknock
2011-Dec-04 20:56 UTC
[R] Group several variables and apply a function to the group
Aur?lien PHILIPPOT wrote> > Dear R-experts, > I am struggling with the following problem, and I am looking for advice > from more experienced R-users: I have a data frame with 2 identifying > variables (comn and mi), and an output variable (x). comn is a variable > for > a company and mi is a variable for a month. > > comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", > "xyz") > mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") > x<- c("-0.0031", "0.0009", "-0.007", "0.1929","0.0087", "0.099","-0.089", > "0.005", "-0.0078", "0.67" ) > df<- data.frame(comn=comn, mi=mi, x=x) > > > For each company, within a particular month, I would like to compute the > standard deviation of x: for example, for abc, I would like to compute the > sd of x for month1 (when mi=1) and for month2 (when mi=2). > > In other languages (Stata for instance), I would create a grouping > variable > (group comnn and mi) and then, apply the sd function for each group. > > However, I don't find an elegant way to do the same in R: > > I was thinking about the following: I could subset my data frame by mi and > create one file per month, and then make a loop and in each file, use a > "by" operator for each comn. I am sure it would work, but I feel that it > would be like killing an ant with a tank. > > I was wondering if anyone knew a more straightforward way to implement > that > kind of operation? > > Thanks a lot, > > Best, > Aurelien > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >One way would be to use the aggregate function. # Your Data ... # Note: I have removed the quotes off the output variable x comn<-c("abc", "abc", "abc", "abc", "abc", "abc", "xyz", "xyz","xyz", "xyz") mi<- c("1", "1","1", "2", "2", "2", "1", "1", "3", "3") x<- c(-0.0031, 0.0009, -0.007, 0.1929,0.0087, 0.099,-0.089, 0.005, -0.0078, 0.67) df<- data.frame(comn=comn, mi=mi, x=x) # Aggregate Function aggregate(df$x, by=list(df$comn,df$mi),FUN=sd) HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Group-several-variables-and-apply-a-function-to-the-group-tp4158017p4158090.html Sent from the R help mailing list archive at Nabble.com.
John Kane
2011-Dec-04 21:07 UTC
[R] Group several variables and apply a function to the group
?aggregate should do it
aggregate(df$x,list(df$comn, df$mi), sd)
There are other ways of course
Using the reshape2 package
library(reshape2)
x1 <- melt(df, id=c("comn", "mi"))
dcast(x1, comn + mi ~ variable, sd)
--- On Sun, 12/4/11, Aur?lien PHILIPPOT <aurelien.philippot at gmail.com>
wrote:
> From: Aur?lien PHILIPPOT <aurelien.philippot at gmail.com>
> Subject: [R] Group several variables and apply a function to the group
> To: R-help at r-project.org
> Received: Sunday, December 4, 2011, 3:32 PM
> Dear R-experts,
> I am struggling with the following problem, and I am
> looking for advice
> from more experienced R-users: I have a data frame with 2
> identifying
> variables (comn and mi), and an output variable (x). comn
> is a variable for
> a company and mi is a variable for a month.
>
> comn<-c("abc", "abc", "abc",
"abc", "abc", "abc", "xyz",
> "xyz","xyz", "xyz")
> mi<- c("1", "1","1", "2",
"2", "2", "1", "1", "3",
"3")
> x<- c("-0.0031", "0.0009", "-0.007",
"0.1929","0.0087",
> "0.099","-0.089",
> "0.005", "-0.0078", "0.67" )
> df<- data.frame(comn=comn, mi=mi, x=x)
>
>
> For each company, within a particular month, I would like
> to compute the
> standard deviation of x: for example, for abc, I would like
> to compute the
> sd of x for month1 (when mi=1) and for month2 (when mi=2).
>
> In other languages (Stata for instance), I would create a
> grouping variable
> (group comnn and mi) and then, apply the sd function for
> each group.
>
> However, I don't find an elegant way to do the same in R:
>
> I was thinking about the following: I could subset my data
> frame by mi and
> create one file per month, and then make a loop and in each
> file, use a
> "by" operator for each comn. I am sure it would work, but I
> feel that it
> would be like killing an ant with a tank.
>
> I was wondering if anyone knew a more straightforward way
> to implement that
> kind of operation?
>
> Thanks a lot,
>
> Best,
> Aurelien
>
> ??? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>