HI All, I have a long data file generated from a minimal pair test that I gave to learners of Arabic before and after a phonetic training regime. For each of thirty some subjects there are 800 rows of data, from each of 400 items at pre and posttest. For each item the subject got correct, there is a 'C' in the column 'Correct'. The line: tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C")) gives me the sum of correct answers for each subject. However, I would like to have that sum separated by Time (pre or post). Is there a simple way to do that? What if I further wish to separate by Group (T or C)? Thanks, Kevin [[alternative HTML version deleted]]
Since you did provide a description of your data (e.g., at least 'str(ALLDATA)') so that we know its structure, I will take a guess: tapply(ALLDATA$Correct, list(ALLDATA$Subject, ALLDATA$Time), function(x)sum(x=="C")) On Sat, Apr 30, 2011 at 3:28 PM, Kevin Burnham <kburnham at gmail.com> wrote:> HI All, > > I have a long data file generated from a minimal pair test that I gave to > learners of Arabic before and after a phonetic training regime. ?For each of > thirty some subjects there are 800 rows of data, from each of 400 items at > pre and posttest. ?For each item the subject got correct, there is a 'C' in > the column 'Correct'. ?The line: > > tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C")) > > gives me the sum of correct answers for each subject. > > However, I would like to have that sum separated by Time (pre or post). ?Is > there a simple way to do that? > > > What if I further wish to separate by Group (T or C)? > > Thanks, > Kevin > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Hi: If you have R 2.11.x or later, one can use the formula version of aggregate(): aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = function(x) sum(x == 'C')) A variety of contributed packages (plyr, data.table, doBy, sqldf and remix, among others) have similar capabilities. If you want some additional summaries (e.g., percent correct), here is an example function for a single subject/group that aggregate() can use to propagate to all subgroups and subjects (I encourage you to play with it): f <- function(x) { Correct <- sum(x == 'C') Percent <- round(100 * Correct/length(x), 3) c(Number = Correct, Percent = Percent) } aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = f) The particular function isn't as important as knowing you can do this sort of thing. Several of the contributed packages indicated above have similar, if not superior, capabilities, depending on the situation. Toy example to test the above: dd <- data.frame(Subject = rep(1:5, each = 100), Group = rep(rep(c('C', 'T'), each = 50), 5), Correct = factor(rbinom(500, 1, 0.8), labels = c('I', 'C'))) aggregate(Correct ~ Subject + Group, data = dd, FUN = function(x) sum(x == 'C')) Subject Group Correct 1 1 C 40 2 2 C 36 3 3 C 39 4 4 C 37 5 5 C 41 6 1 T 43 7 2 T 45 8 3 T 37 9 4 T 45 10 5 T 36 aggregate(Correct ~ Subject + Group, data = dd, FUN = f) Subject Group Correct.Number Correct.Percent 1 1 C 40 80 2 2 C 36 72 3 3 C 39 78 4 4 C 37 74 5 5 C 41 82 6 1 T 43 86 7 2 T 45 90 8 3 T 37 74 9 4 T 45 90 10 5 T 36 72 HTH, Dennis On Sat, Apr 30, 2011 at 12:28 PM, Kevin Burnham <kburnham at gmail.com> wrote:> HI All, > > I have a long data file generated from a minimal pair test that I gave to > learners of Arabic before and after a phonetic training regime. ?For each of > thirty some subjects there are 800 rows of data, from each of 400 items at > pre and posttest. ?For each item the subject got correct, there is a 'C' in > the column 'Correct'. ?The line: > > tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C")) > > gives me the sum of correct answers for each subject. > > However, I would like to have that sum separated by Time (pre or post). ?Is > there a simple way to do that? > > > What if I further wish to separate by Group (T or C)? > > Thanks, > Kevin > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 05/01/2011 05:28 AM, Kevin Burnham wrote:> HI All, > > I have a long data file generated from a minimal pair test that I gave to > learners of Arabic before and after a phonetic training regime. For each of > thirty some subjects there are 800 rows of data, from each of 400 items at > pre and posttest. For each item the subject got correct, there is a 'C' in > the column 'Correct'. The line: > > tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C")) > > gives me the sum of correct answers for each subject. > > However, I would like to have that sum separated by Time (pre or post). Is > there a simple way to do that? > > > What if I further wish to separate by Group (T or C)? >Hi Kevin, When I looked at this, I immediately thought of the brkdnNest function (which uses tapply internally). In order to get the counts with the current function, I had to create a new variable (newcorrect). However, the idea so attracted me that I programmed it into the code (thanks). Here is a way to get your summary by Subject and Time: ALLDATA<-data.frame(Subject=rep(1:30,each=800), Occasion=factor(rep(c("pre","post"),2400),levels=c("pre","post")), Correct=sample(c("C","I"),2400,TRUE)) tapply(ALLDATA$Correct,list(ALLDATA$Subject,ALLDATA$Occasion), function(x) sum(x=="C")) library(plotrix) brkdnNest(Correct~Subject+Occasion,ALLDATA,FUN="propbrk",trueval="C") ALLDATA$newcorrect<-ALLDATA$Correct=="C" brkdnNest(newcorrect~Subject+Occasion,ALLDATA,FUN="sum") To get the three level breakdown, add another factor: ALLDATA$Group<-rep(c("T","C"),each=1200) brkdnNest(newcorrect~Group+Subject+Occasion,ALLDATA,FUN="sum") Notice that this gives you all of the subjects for each Group, even if they weren't in that Group. I'll work on that one, for I have just switched to using "tapply" for this breakdown, as it doesn't discard NA values (the cause of the minor bug in barNest) Jim