thr3ads.net - R help - [R] using tapply with multiple variables [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Kevin Burnham

2011-Apr-30 19:28 UTC

[R] using tapply with multiple variables

HI All,

I have a long data file generated from a minimal pair test that I gave to
learners of Arabic before and after a phonetic training regime.  For each of
thirty some subjects there are 800 rows of data, from each of 400 items at
pre and posttest.  For each item the subject got correct, there is a 'C'
in
the column 'Correct'.  The line:

tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C"))

gives me the sum of correct answers for each subject.

However, I would like to have that sum separated by Time (pre or post).  Is
there a simple way to do that?


What if I further wish to separate by Group (T or C)?

Thanks,
Kevin

	[[alternative HTML version deleted]]

jim holtman

2011-Apr-30 20:10 UTC

head link

[R] using tapply with multiple variables

Since you did provide a description of your data (e.g., at least
'str(ALLDATA)') so that we know its structure, I will take a guess:

tapply(ALLDATA$Correct, list(ALLDATA$Subject, ALLDATA$Time),
function(x)sum(x=="C"))

On Sat, Apr 30, 2011 at 3:28 PM, Kevin Burnham <kburnham at gmail.com>
wrote:> HI All,
>
> I have a long data file generated from a minimal pair test that I gave to
> learners of Arabic before and after a phonetic training regime. ?For each
of
> thirty some subjects there are 800 rows of data, from each of 400 items at
> pre and posttest. ?For each item the subject got correct, there is a
'C' in
> the column 'Correct'. ?The line:
>
> tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C"))
>
> gives me the sum of correct answers for each subject.
>
> However, I would like to have that sum separated by Time (pre or post). ?Is
> there a simple way to do that?
>
>
> What if I further wish to separate by Group (T or C)?
>
> Thanks,
> Kevin
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

Dennis Murphy

2011-May-01 05:03 UTC

head link

[R] using tapply with multiple variables

Hi:

If you have R 2.11.x or later, one can use the formula version of aggregate():

aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = function(x)
sum(x == 'C'))

A variety of contributed packages (plyr, data.table, doBy, sqldf and
remix, among others) have similar capabilities.

If you want some additional summaries (e.g., percent correct), here is
an example function for a single subject/group that aggregate() can
use to propagate to all subgroups and subjects (I encourage you to
play with it):

f <- function(x) {
    Correct <- sum(x == 'C')
    Percent <- round(100 * Correct/length(x), 3)
    c(Number = Correct, Percent = Percent)
  }
aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = f)

The particular function isn't as important as knowing you can do this
sort of thing. Several of the contributed packages indicated above
have similar, if not superior, capabilities, depending on the
situation.

Toy example to test the above:

dd <- data.frame(Subject = rep(1:5, each = 100),
                  Group = rep(rep(c('C', 'T'), each = 50), 5),
                  Correct = factor(rbinom(500, 1, 0.8), labels = c('I',
'C')))
aggregate(Correct ~ Subject + Group, data = dd, FUN = function(x) sum(x ==
'C'))
   Subject Group Correct
1        1     C      40
2        2     C      36
3        3     C      39
4        4     C      37
5        5     C      41
6        1     T      43
7        2     T      45
8        3     T      37
9        4     T      45
10       5     T      36
aggregate(Correct ~ Subject + Group, data = dd, FUN = f)
   Subject Group Correct.Number Correct.Percent
1        1     C             40              80
2        2     C             36              72
3        3     C             39              78
4        4     C             37              74
5        5     C             41              82
6        1     T             43              86
7        2     T             45              90
8        3     T             37              74
9        4     T             45              90
10       5     T             36              72

HTH,
Dennis

On Sat, Apr 30, 2011 at 12:28 PM, Kevin Burnham <kburnham at gmail.com>
wrote:> HI All,
>
> I have a long data file generated from a minimal pair test that I gave to
> learners of Arabic before and after a phonetic training regime. ?For each
of
> thirty some subjects there are 800 rows of data, from each of 400 items at
> pre and posttest. ?For each item the subject got correct, there is a
'C' in
> the column 'Correct'. ?The line:
>
> tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C"))
>
> gives me the sum of correct answers for each subject.
>
> However, I would like to have that sum separated by Time (pre or post). ?Is
> there a simple way to do that?
>
>
> What if I further wish to separate by Group (T or C)?
>
> Thanks,
> Kevin
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Jim Lemon

2011-May-01 12:29 UTC

head link

[R] using tapply with multiple variables

On 05/01/2011 05:28 AM, Kevin Burnham wrote:> HI All,
>
> I have a long data file generated from a minimal pair test that I gave to
> learners of Arabic before and after a phonetic training regime.  For each
of
> thirty some subjects there are 800 rows of data, from each of 400 items at
> pre and posttest.  For each item the subject got correct, there is a
'C' in
> the column 'Correct'.  The line:
>
> tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C"))
>
> gives me the sum of correct answers for each subject.
>
> However, I would like to have that sum separated by Time (pre or post).  Is
> there a simple way to do that?
>
>
> What if I further wish to separate by Group (T or C)?
>Hi Kevin,
When I looked at this, I immediately thought of the brkdnNest function 
(which uses tapply internally). In order to get the counts with the 
current function, I had to create a new variable (newcorrect). However, 
the idea so attracted me that I programmed it into the code (thanks).
Here is a way to get your summary by Subject and Time:

ALLDATA<-data.frame(Subject=rep(1:30,each=800),
 
Occasion=factor(rep(c("pre","post"),2400),levels=c("pre","post")),
  Correct=sample(c("C","I"),2400,TRUE))
tapply(ALLDATA$Correct,list(ALLDATA$Subject,ALLDATA$Occasion),
  function(x) sum(x=="C"))
library(plotrix)
brkdnNest(Correct~Subject+Occasion,ALLDATA,FUN="propbrk",trueval="C")
ALLDATA$newcorrect<-ALLDATA$Correct=="C"
brkdnNest(newcorrect~Subject+Occasion,ALLDATA,FUN="sum")

To get the three level breakdown, add another factor:

ALLDATA$Group<-rep(c("T","C"),each=1200)
brkdnNest(newcorrect~Group+Subject+Occasion,ALLDATA,FUN="sum")

Notice that this gives you all of the subjects for each Group, even if 
they weren't in that Group. I'll work on that one, for I have just 
switched to using "tapply" for this breakdown, as it doesn't
discard NA
values (the cause of the minor bug in barNest)

Jim

Maybe Matching Threads

Search for more possibly parallel threads

R help - Apr 2011 - using tapply with multiple variables

[R] using tapply with multiple variables

[R] using tapply with multiple variables

[R] using tapply with multiple variables

[R] using tapply with multiple variables

Maybe Matching Threads