I can think of several ways to blunt force hard code what I want but I imagine there is a command or two that can be easily combined to do this: I have a data frame with about 23000 observations. There first variable is the group to which the observation belongs (about 500 different groups). The second variable is a response for each observation that is a 1,2,3,4 or 5. I want to be able to calculate the percentage of each group that choose each response. For example I want to know what percentage of group 1 (which may have a value of 34456) choose response 1 and so on. Here is some code I wrote that generates a data frame like the one I have. pop <- matrix(1:100000) groupIDs <- sample(pop,500) groupVar <- sample(groupIDs,23000,replace=TRUE) responseVar <- sample(1:5,23000,replace=TRUE) example.data <- data.frame(groupVar,responseVar) Is there a fast way to calculate these percentages beyond writing loops to manually count the responses for each of the groups? Thanks, EG [[alternative HTML version deleted]]
on 05/23/2008 09:51 AM Economics Guy wrote:> I can think of several ways to blunt force hard code what I want but I > imagine there is a command or two that can be easily combined to do this: > > I have a data frame with about 23000 observations. There first variable is > the group to which the observation belongs (about 500 different groups). The > second variable is a response for each observation that is a 1,2,3,4 or 5. I > want to be able to calculate the percentage of each group that choose each > response. For example I want to know what percentage of group 1 (which may > have a value of 34456) choose response 1 and so on. > > Here is some code I wrote that generates a data frame like the one I have. > > pop <- matrix(1:100000) > groupIDs <- sample(pop,500) > groupVar <- sample(groupIDs,23000,replace=TRUE) > responseVar <- sample(1:5,23000,replace=TRUE) > > example.data <- data.frame(groupVar,responseVar) > > Is there a fast way to calculate these percentages beyond writing loops to > manually count the responses for each of the groups? > > Thanks, > > EGUsing: table(example.data) will give you a cross tabulation of the counts of your ResponseVar by each groupVar. prop.table(table(example.data), 1) will give you a row-wise proportion (0 - 1) of the counts of ResponseVar for each groupVar. If you want percentages (0 - 100): prop.table(table(example.data), 1) * 100 See ?table and ?prop.table for more information. HTH, Marc Schwartz
tapply(example.data$responseVar,example.data$groupVar,function(x){prop.t
able(table(x))})
Michael Conklin
Chief Methodologist - Advanced Analytics
 
MarketTools, Inc.
6465 Wayzata Blvd. Suite 170
Minneapolis, MN 55426 
Tel: 952.417.4719 | Mobile:612.201.8978
Michael.Conklin at markettools.com
 
MarketTools(r)    http://www.markettools.com
 
This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.
 
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Economics Guy
Sent: Friday, May 23, 2008 9:52 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Percentages for categorical data by group
I can think of several ways to blunt force hard code what I want but I
imagine there is a command or two that can be easily combined to do
this:
I have a data frame with about 23000 observations. There first variable
is
the group to which the observation belongs (about 500 different groups).
The
second variable is a response for each observation that is a 1,2,3,4 or
5. I
want to be able to calculate the percentage of each group that choose
each
response. For example I want to know what percentage of group 1 (which
may
have a value of 34456) choose response 1 and so on.
Here is some code I wrote that generates a data frame like the one I
have.
pop <- matrix(1:100000)
groupIDs <- sample(pop,500)
groupVar <- sample(groupIDs,23000,replace=TRUE)
responseVar <- sample(1:5,23000,replace=TRUE)
example.data <- data.frame(groupVar,responseVar)
Is there a fast way to calculate these percentages beyond writing loops
to
manually count the responses for each of the groups?
Thanks,
EG
	[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
I appreciate all the help. The trouble is that in my real data set each group does not always have an observation that choose each response. This results in some of the "rows" returned from prop.table() to be shorter than others so I get: Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 8) Is there a way to tell rbind() or do.call() to treat missing values as zero or make prop.table() include the zero proportions? On Fri, May 23, 2008 at 1:59 PM, Phil Spector <spector@stat.berkeley.edu> wrote:> EG - > Thanks for the reproducible example! > > When I run your code, and check the class of the result from tapply(), I > see that it is an > "array", and using dim(), I see it's an array > of length 500. How big is each element? > > table(sapply(res,length)) >> > > 5 > 500 > > So each piece is the same length. That means we could > make a 500x5 matrix as follows: > > do.call(rbind,res) > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spector@stat.berkeley.edu > > > > > > >[[alternative HTML version deleted]]
Maybe Matching Threads
- lm#contrasts#one level in factor: bug or feature
- call lattice function in a function passing "groups" argument
- using "unstack" inside my function: that old scope problem again
- How to pass in a list of variables as an argument to a function?
- Scraping a web page