I'm sorry for what I'm sure is a terribly simple question. I have a large dataframe along these lines: S<- 1:3 d<- data.frame(cbind(S=rep(paste('S',S,sep=""),each=30), trial=rep(1:3,each=10), FactorA=rep(paste('L',1,sep=""),each=30), Accc(rep(1,each=20),rep(0,each=10)), Sample=rep(1:10,3), DV= sample(runif(10),10))) but where each trial has hundreds of samples and where there are several hundred trials per subject. I need to comb through the data and find, for example, how many trials are correct (Acc==1). I can figure it out with loops (as I did with the program I used to use), but I was hoping for a much faster/cleaner way to select out these trials -- the real data frame has several million rows. thanks in advance Matthew [[alternative HTML version deleted]]
Hi, you should be able to do most of your summaries using tapply() or aggregate(). for your example, tapply(d$Acc,list(d$Sample),table) Here tapply takes Acc, "splits" it by Sample, and then tables Acc (which returns how many 0s/1s were observed in variable Acc for each stratum of Sample). HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/summary-stats-on-continuous-data-tp2255496p2255516.html Sent from the R help mailing list archive at Nabble.com.
Hi Daniel, thanks for your reply. Unfortunately, that is not doing what I need. In the example I sent, there are three subjects (S1, S2 & S3). Each subject has 3 trials worth of data and each trial has 10 samples. What I want to return is the accuracy rate for each subject. The answer is 66.6% because in this toy example, each subject has 2 correct trials and 1 incorrect trial. An easy way is to subset the dataframe into correct and incorrect sets for each subject and then just take the number of correct trials (length(unique(d$trial)) divided by total trials for that subject. But there must be a less clumsy way? Especially since I need to do this many times, not just for accuracy, but for other variables/codes in the real dataset. thanks again for any help Matthew -- View this message in context: http://r.789695.n4.nabble.com/summary-stats-on-continuous-data-tp2255496p2255705.html Sent from the R help mailing list archive at Nabble.com.
You can define a function that does just that: sum the 1s in Acc and divide by the length of Acc. Then use tapply to apply the function for each subject. f=function(x){sum(as.numeric(as.character(x)))/length(x)} tapply(d$Acc,list(d$S),f) HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/summary-stats-on-continuous-data-tp2255496p2256384.html Sent from the R help mailing list archive at Nabble.com.
Seemingly Similar Threads
- remove data frame from list of data frames
- testing two-factor anova effects using model comparison approach with lm() and anova()
- Behaviour of interactions in glm
- Problem with bargraph.CI in Sciplot package
- How to generate a new factor variable by two other factor variables