RaoulD
2010-Jun-26 12:17 UTC
[R] Calculating Summaries for each level of a Categorical variable
Hi, I have a dataset which has a categorical variable "R",a count variable C (integer) and 4 or more numeric variables (A,T,W,H - integers) containing measures for "R". I would like to summarize each level of the variable R by the average for A,T,W and H. I have written a function to calculate weighted averages using C as the weight and this is given below. The function works perfectly but how do I add the additional dimension I require to this function? Dataset: RTR A T W H R1 10 20 20 10 R2 60 20 50 10 R3 45 10 20 50 R4 68 50 20 10 R1 73 20 40 46 R3 25 30 10 54 R3 36 90 20 10 R2 29 10 30 30 # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C WA<-function(A,C) { sp_A<-c(A %*% C) sum_C<-sum(C) WA<-sp_A/sum_C return(WA) } I am trying to incorporate the additional step of calculating the weighted average of A,T,W and H for each level of R. Need help with this. Thanks in advance! Raoul -- View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html Sent from the R help mailing list archive at Nabble.com.
Corey Sparks
2010-Jun-26 14:50 UTC
[R] Calculating Summaries for each level of a Categorical variable
Did you try tapply? ?tapply tapply(RT, RT$R, fun=WA) or something like that ----- Corey Sparks, PhD Assistant Professor Department of Demography and Organization Studies University of Texas at San Antonio 501 West Durango Blvd Monterey Building 2.270C San Antonio, TX 78207 210-458-3166 corey.sparks 'at' utsa.edu https://rowdyspace.utsa.edu/users/ozd504/www/index.htm -- View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269444.html Sent from the R help mailing list archive at Nabble.com.
Christos Argyropoulos
2010-Jun-26 15:08 UTC
[R] Calculating Summaries for each level of a Categorical variable
Look at the summary.formula function inside package Hmisc Christos> Date: Sat, 26 Jun 2010 05:17:34 -0700 > From: raoul.t.dsouza@gmail.com > To: r-help@r-project.org > Subject: [R] Calculating Summaries for each level of a Categorical variable > > > Hi, > > I have a dataset which has a categorical variable "R",a count variable C > (integer) and 4 or more numeric variables (A,T,W,H - integers) containing > measures for "R". I would like to summarize each level of the variable R by > the average for A,T,W and H. > > I have written a function to calculate weighted averages using C as the > weight and this is given below. The function works perfectly but how do I > add the additional dimension I require to this function? > > Dataset: RT> R A T W H > R1 10 20 20 10 > R2 60 20 50 10 > R3 45 10 20 50 > R4 68 50 20 10 > R1 73 20 40 46 > R3 25 30 10 54 > R3 36 90 20 10 > R2 29 10 30 30 > > # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C > WA<-function(A,C) { > sp_A<-c(A %*% C) > sum_C<-sum(C) > WA<-sp_A/sum_C > return(WA) > } > > I am trying to incorporate the additional step of calculating the weighted > average of A,T,W and H for each level of R. Need help with this. > > Thanks in advance! > Raoul > -- > View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code._________________________________________________________________ Hotmail: Powerful Free email with security by Microsoft. [[alternative HTML version deleted]]
RaoulD
2010-Jun-27 04:46 UTC
[R] Calculating Summaries for each level of a Categorical variable
Hi Corey, Thanks so much for this. However, I get this error for tapply - "Error in tapply(RT, RT$R, fun=WA): arguments must have same length". Any idea how to get around this? Thanks again, Raoul -- View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269815.html Sent from the R help mailing list archive at Nabble.com.
RaoulD
2010-Jun-27 04:48 UTC
[R] Calculating Summaries for each level of a Categorical variable
Hi Christos, Thanks for this. I had a look at Summary.Forumla in the Hmisc package and it is extremely complicated for me. Still trying to decipher how I could use it. Regards, Raoul -- View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html Sent from the R help mailing list archive at Nabble.com.
David Hajage
2010-Jun-27 07:42 UTC
[R] Calculating Summaries for each level of a Categorical variable
You could try the remix function in remix package. David Le 27 juin 2010 ? 06:48, RaoulD <raoul.t.dsouza at gmail.com> a ?crit :> > Hi Christos, > > Thanks for this. I had a look at Summary.Forumla in the Hmisc > package and it > is extremely complicated for me. Still trying to decipher how I > could use > it. > > Regards, > Raoul > -- > View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Christos Argyropoulos
2010-Jun-27 12:36 UTC
[R] Calculating Summaries for each level of a Categorical variable
Hi Raoul, I presume you need these summaries for a table of descriptive statistics for a thesis/report/paper ("Table 1" as known informally by medical researchers). If this is the case, then specify method="reverse" to summary.formula. In the following small example, I create 4 groups of patients and specify 2 characteristics per patient (age and gender) and use summary.formula to summarize characteristics by group. Running the stats on patient characteristics by group is optional but is included for completeness. If you are looking for something like this I strongly advise you spent some time fiddling around with summary.formula and read: Harrell FE (2004): Statistical tables and plots using S and LaTeX (available from http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatReport/summary.pdf The 2-3 hours you are going to need to familiarize yourself with this package are really worth spending for (especially if you are going to use call LaTEX on the output). If you are a Windows user, copy and paste the output of the print function into Excel or OpenOffice and use the Text to Columns facilities of the two programs to format the output into a table that can be used inside a manuscript. Christos ## R-code follows library(Hmisc) ## One baseline factor (e.g. patient group) grp<-round(runif(20,1,4)) grp<-factor(grp,labels=paste("Group",1:4)) ## Another factor (e.g. sex) sex<-round(runif(20,1,2)) sex<-factor(sex,labels=c("Male","Female")) ## A continuous variable (e.g. age) age<-rlnorm(20,4,.1) ## A data frame data<-data.frame(age=age,grp=grp,sex=sex) ## Table 1 sm<-summary(grp~sex+age,method="reverse",overall=T,test=T) print(sm,dig=2,exclude1=F) Descriptive Statistics by grp +----------+------------------+------------------+------------------+------------------+------------------+----------------------------+ |????????? |Group 1?????????? |Group 2?????????? |Group 3?????????? |Group 4?????????? |Combined????????? |? Test????????????????????? | |????????? |(N=3)???????????? |(N=6)???????????? |(N=8)???????????? |(N=3)???????????? |(N=20)??????????? |Statistic?????????????????? | +----------+------------------+------------------+------------------+------------------+------------------+----------------------------+ |sex : Male|????????? 67% ( 2)|????????? 67% ( 4)|????????? 25% ( 2)|????????? 67% ( 2)|????????? 50% (10)|Chi-square=3.3 d.f.=3 P=0.34| +----------+------------------+------------------+------------------+------------------+------------------+----------------------------+ |??? Female|????????? 33% ( 1)|????????? 33% ( 2)|????????? 75% ( 6)|????????? 33% ( 1)|????????? 50% (10)|??????????????????????????? | +----------+------------------+------------------+------------------+------------------+------------------+----------------------------+ |age?????? |????????? 60/62/65|????????? 51/55/60|????????? 46/51/57|????????? 46/48/52|????????? 49/54/60|?? F=2.9 d.f.=3,16 P=0.068? | +----------+------------------+------------------+------------------+------------------+------------------+----------------------------+ ?> Date: Sat, 26 Jun 2010 21:48:05 -0700 > From: raoul.t.dsouza at gmail.com > To: r-help at r-project.org > Subject: Re: [R] Calculating Summaries for each level of a Categorical variable > > > Hi Christos, > > Thanks for this. I had a look at Summary.Forumla in the Hmisc package and it > is extremely complicated for me. Still trying to decipher how I could use > it. > > Regards, > Raoul > -- > View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code._________________________________________________________________ Hotmail: Free, trusted and rich email service.
Greg Snow
2010-Jun-28 17:23 UTC
[R] Calculating Summaries for each level of a Categorical variable
The problem is that tapply is expecting a vector for the first argument, your first argument is a list or data frame, so the length that it sees is the number of list elements (columns of the data frame). You need to either pass a single vector, or use functions like aggregate or the plyr package to work on all the columns in a data frame. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of RaoulD > Sent: Saturday, June 26, 2010 10:47 PM > To: r-help at r-project.org > Subject: Re: [R] Calculating Summaries for each level of a Categorical > variable > > > Hi Corey, > > Thanks so much for this. However, I get this error for tapply - "Error > in > tapply(RT, RT$R, fun=WA): > arguments must have same length". Any idea how to get around this? > > Thanks again, > Raoul > -- > View this message in context: > http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of- > a-Categorical-variable-tp2269349p2269815.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.