thr3ads.net - R help - [R] Percentages for categorical data by group [May 2008]

If this information is useful, please help other people find it:
Share via:

Economics Guy

2008-May-23 14:51 UTC

[R] Percentages for categorical data by group

I can think of several ways to blunt force hard code what I want but I
imagine there is a command or two that can be easily combined to do this:

I have a data frame with about 23000 observations. There first variable is
the group to which the observation belongs (about 500 different groups). The
second variable is a response for each observation that is a 1,2,3,4 or 5. I
want to be able to calculate the percentage of each group that choose each
response. For example I want to know what percentage of group 1 (which may
have a value of 34456) choose response 1 and so on.

Here is some code I wrote that generates a data frame like the one I have.

pop <- matrix(1:100000)
groupIDs <- sample(pop,500)
groupVar <- sample(groupIDs,23000,replace=TRUE)
responseVar <- sample(1:5,23000,replace=TRUE)

example.data <- data.frame(groupVar,responseVar)

Is there a fast way to calculate these percentages beyond writing loops to
manually count the responses for each of the groups?

Thanks,

EG

	[[alternative HTML version deleted]]

Marc Schwartz

2008-May-23 15:05 UTC

head link

[R] Percentages for categorical data by group

on 05/23/2008 09:51 AM Economics Guy wrote:> I can think of several ways to blunt force hard code what I want but I
> imagine there is a command or two that can be easily combined to do this:
> 
> I have a data frame with about 23000 observations. There first variable is
> the group to which the observation belongs (about 500 different groups).
The
> second variable is a response for each observation that is a 1,2,3,4 or 5.
I
> want to be able to calculate the percentage of each group that choose each
> response. For example I want to know what percentage of group 1 (which may
> have a value of 34456) choose response 1 and so on.
> 
> Here is some code I wrote that generates a data frame like the one I have.
> 
> pop <- matrix(1:100000)
> groupIDs <- sample(pop,500)
> groupVar <- sample(groupIDs,23000,replace=TRUE)
> responseVar <- sample(1:5,23000,replace=TRUE)
> 
> example.data <- data.frame(groupVar,responseVar)
> 
> Is there a fast way to calculate these percentages beyond writing loops to
> manually count the responses for each of the groups?
> 
> Thanks,
> 
> EG
Using:

   table(example.data)

will give you a cross tabulation of the counts of your ResponseVar by 
each groupVar.

   prop.table(table(example.data), 1)

will give you a row-wise proportion (0 - 1) of the counts of ResponseVar 
for each groupVar. If you want percentages (0 - 100):

    prop.table(table(example.data), 1) * 100


See ?table and ?prop.table for more information.

HTH,

Marc Schwartz

Michael Conklin

2008-May-23 15:33 UTC

head link

[R] Percentages for categorical data by group

tapply(example.data$responseVar,example.data$groupVar,function(x){prop.t
able(table(x))})

Michael Conklin

Chief Methodologist - Advanced Analytics

 

MarketTools, Inc.

6465 Wayzata Blvd. Suite 170

Minneapolis, MN 55426 

Tel: 952.417.4719 | Mobile:612.201.8978

Michael.Conklin at markettools.com

 

MarketTools(r)    http://www.markettools.com

 

This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.

 


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Economics Guy
Sent: Friday, May 23, 2008 9:52 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Percentages for categorical data by group

I can think of several ways to blunt force hard code what I want but I
imagine there is a command or two that can be easily combined to do
this:

I have a data frame with about 23000 observations. There first variable
is
the group to which the observation belongs (about 500 different groups).
The
second variable is a response for each observation that is a 1,2,3,4 or
5. I
want to be able to calculate the percentage of each group that choose
each
response. For example I want to know what percentage of group 1 (which
may
have a value of 34456) choose response 1 and so on.

Here is some code I wrote that generates a data frame like the one I
have.

pop <- matrix(1:100000)
groupIDs <- sample(pop,500)
groupVar <- sample(groupIDs,23000,replace=TRUE)
responseVar <- sample(1:5,23000,replace=TRUE)

example.data <- data.frame(groupVar,responseVar)

Is there a fast way to calculate these percentages beyond writing loops
to
manually count the responses for each of the groups?

Thanks,

EG

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Economics Guy

2008-May-23 18:35 UTC

head link

[R] Percentages for categorical data by group

I appreciate all the help.  The trouble is that in my real data set each
group does not always have an observation that choose each response. This
results in some of the "rows" returned from prop.table() to be shorter
than
others so I get:

Warning message:
In function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg 8)

Is there a way to tell rbind() or do.call() to treat missing values as zero
or make prop.table() include the zero proportions?



On Fri, May 23, 2008 at 1:59 PM, Phil Spector <spector@stat.berkeley.edu>
wrote:
> EG -
>    Thanks for the reproducible example!
>
>    When I run your code, and check the class of the result from tapply(), I
> see that it is an
> "array", and using dim(), I see it's an array
> of length 500.  How big is each element?
>
>  table(sapply(res,length))
>>
>
>  5
> 500
>
> So each piece is the same length.  That means we could
> make a 500x5 matrix as follows:
>
> do.call(rbind,res)
>                                       - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spector@stat.berkeley.edu
>
>
>
>
>
>
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more maybe matching threads

R help - May 2008 - Percentages for categorical data by group

[R] Percentages for categorical data by group

[R] Percentages for categorical data by group

[R] Percentages for categorical data by group

[R] Percentages for categorical data by group

Seemingly Similar Threads