thr3ads.net - R help - [R] summary stats on continuous data [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Matthew Finkbeiner

2010-Jun-15 07:42 UTC

[R] summary stats on continuous data

I'm sorry for what I'm sure is a terribly simple question.  I have a
large
dataframe along these lines:

S<- 1:3
d<- data.frame(cbind(S=rep(paste('S',S,sep=""),each=30),
trial=rep(1:3,each=10),
    FactorA=rep(paste('L',1,sep=""),each=30),
Accc(rep(1,each=20),rep(0,each=10)),
    Sample=rep(1:10,3), DV= sample(runif(10),10)))

but where each trial has hundreds of samples and where there are several
hundred trials per subject.

I need to comb through the data and find, for example, how many trials are
correct (Acc==1).  I can figure it out with loops (as I did with the program
I used to use), but I was hoping for a much faster/cleaner way to select out
these trials -- the real data frame has several million rows.

thanks in advance

Matthew

	[[alternative HTML version deleted]]

Daniel Malter

2010-Jun-15 08:06 UTC

head link

[R] summary stats on continuous data

Hi, you should be able to do most of your summaries using tapply() or
aggregate().

for your example, 

tapply(d$Acc,list(d$Sample),table)

Here tapply takes Acc, "splits" it by Sample, and then tables Acc
(which
returns how many 0s/1s were observed in variable Acc for each stratum of
Sample).

HTH,
Daniel
-- 
View this message in context:
http://r.789695.n4.nabble.com/summary-stats-on-continuous-data-tp2255496p2255516.html
Sent from the R help mailing list archive at Nabble.com.

Matthew Finkbeiner

2010-Jun-15 10:51 UTC

head link

[R] summary stats on continuous data

Hi Daniel, thanks for your reply.  Unfortunately, that is not doing what I
need.  In the example I sent, there are three subjects (S1, S2 & S3).  Each
subject has 3 trials worth of data and each trial has 10 samples.  What I
want to return is the accuracy rate for each subject.  The answer is 66.6%
because in this toy example, each subject has 2 correct trials and 1
incorrect trial.

An easy way is to subset the dataframe into correct and incorrect sets for
each subject and then just take the number of correct trials
(length(unique(d$trial)) divided by total trials for that subject.  But
there must be a less clumsy way?  Especially since I need to do this many
times, not just for accuracy, but for other variables/codes in the real
dataset.

thanks again for any help

Matthew


-- 
View this message in context:
http://r.789695.n4.nabble.com/summary-stats-on-continuous-data-tp2255496p2255705.html
Sent from the R help mailing list archive at Nabble.com.

Daniel Malter

2010-Jun-15 19:03 UTC

head link

[R] summary stats on continuous data

You can define a function that does just that: sum the 1s in Acc and divide
by the length of Acc. Then use tapply to apply the function for each
subject.

f=function(x){sum(as.numeric(as.character(x)))/length(x)}
tapply(d$Acc,list(d$S),f)

HTH,
Daniel
-- 
View this message in context:
http://r.789695.n4.nabble.com/summary-stats-on-continuous-data-tp2255496p2256384.html
Sent from the R help mailing list archive at Nabble.com.

Possibly Parallel Threads

Search for more reasonably related threads

R help - Jun 2010 - summary stats on continuous data

[R] summary stats on continuous data

[R] summary stats on continuous data

[R] summary stats on continuous data

[R] summary stats on continuous data

Possibly Parallel Threads