Hamilton-Green, Matthew J
2010-Oct-06 14:46 UTC
[R] methodology question : is anova appropriate for these data?
Representative small sample of data:
algorithmID <-
factor(c(rep('alg1',4),rep('alg2',4),rep('alg3',4)))
threshold <- factor(rep(c(.45,.50,.55,.60),times=3))
score <- c(30,32,31,30,10,12,13,14,22,21,20,24)
d <- data.frame(algorithmID,threshold,score)
AlgorithmID is the name of each algorithm; threshold is the value of a parameter
used by the algorithm that produces the score; the score is a number that can
take any integer value between 0 and 40.
I'd like to know whether different algorithms reliably produce different
scores. A score comes from the algorithm being run with the specified value of
'threshold'. The value of threshold is fixed for a given run of each
algorithm - in that sense I think that (but I'm not sure that) it should be
treated as a fixed factor rather than a random factor.
I am tempted to try:
d.aov <- aov(score ~ algorithmID + Error(threshold/algorithmID))
but I am doubtful whether it is appropriate to treat 'threshold' in this
way.
I have two queries:
1. How should I determine whether ANOVA is an appropriate test of the null
hypothesis that score does not vary significantly by algorithmID?
2. If values for threshold were randomly sampled from the range 0.01 to 0.99,
rather than being fixed, which is an option, would that make any difference to
whether ANOVA would be suitable?
Any advice gratefully received,
Matt
Research Assistant, University of Aberdeen
The University of Aberdeen is a charity registered in Scotland, No SC013683.
