Hi all,
I've been reading about aov() at
http://www.psych.upenn.edu/~baron/rpsych/rpsych.html and
http://davidmlane.com/hyperstat/intro_ANOVA.html and
I try to use this test in experiments with my simulator.
What I would like Anova to tell me is whether the differences I see
when plotting the means of performance per method are significant.
And also, whether this is dependent on the problem size (bigger is
more complex).
I would be very grateful if there's somebody more mathematically skilled
on this list who could tell me whether I'm drawing correct conclusions.
> data
performance method problem
1 146780.0000 -f 960
2 4654.0000 -f 160
3 45840.0000 -f 320
4 54750.0000 -f 320
5 91750.0000 -f 480
6 7452.0000 -f 160
7 8866.0000 -f 160
8 8513.0000 -f 160
9 139520.0000 -f 960
10 85380.0000 -f 480
<snip>
> str(data)
`data.frame': 419 obs. of 3 variables:
$ performance: num 146780 4654 45840 54750 91750 ...
$ method : Factor w/ 7 levels "-f","-f -q","-h0
-r0",..: 1 1 1 1 1 1 1 1 1 1 ...
$ problem : int 960 160 320 320 480 160 160 160 960 480 ...
> summary(aov(performance ~ method * problem, data=data))
Df Sum Sq Mean Sq F value Pr(>F)
method 6 3.3185e+11 5.5308e+10 416.91 < 2.2e-16 ***
problem 1 5.7141e+11 5.7141e+11 4307.26 < 2.2e-16 ***
method:problem 6 9.8891e+10 1.6482e+10 124.24 < 2.2e-16 ***
Residuals 405 5.3728e+10 1.3266e+08
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
I interpret this data as follows:
-1- The performance depends on the chosen method.
If I compute the overall performance means for each method, this will
give me numbers that are significantly different. This means, the method
with the greatest mean is significantly better than at least some other
methods (and not worse than any other method).
-2- The performance depends on the problem complexity.
This is not so interesting. In my setting it is trivial that performance
is worse for more complex problems.
-3- There is interaction between method and complexity, in other words,
when trying to order the methods from bad to good, one cannot simply do
this without taking the problem complexity into account. (for simple
problems method A might be the best, for complex problems, another method
might be the better).
I have not used Error() in my call to aov().
I've seen this one being used: Error(subj/(shape * color)
But I do not have subjects. Or in fact, I believe I have only 1, which is
my simulator. Am I correct about that? Or should I use something like
Error(method * problem) ?
Thanks in advance,
JeeBee.