Hi all, If we wanted to study the effect on the mean of the hourly data based on the hours within a day... and we wanted to do Anova analysis... We have two choices: Please see below: Why are these two approaches giving very different p-values? And which one shall I use? Thanks a lot! 1. treating the hours as double/floating numbers: anova(lm(hourlydata~as.double(hours_factors))) Df Sum Sq Mean Sq F value Pr(>F) as.double(hours_factors) 1 0.0002 0.00019876 1.3425 0.2466 Residuals 14868 2.2013 0.00014806 2. treating the hours as factors: anova(lm(hourlydata~hours_factors)) Df Sum Sq Mean Sq F value Pr(>F) hours_factors 9 0.00077 8.5979e-05 0.5806 0.8142 Residuals 14860 2.20072 1.4810e-04 [[alternative HTML version deleted]]
And a further problem with both approaches: is 11pm (23) really that different from 1am? On Thu, Dec 8, 2011 at 3:28 PM, Michael <comtech.usa at gmail.com> wrote:> Hi all, > > If we wanted to study the effect on the mean of the hourly data based on > the hours within a day... > > and we wanted to do Anova analysis... > > We have two choices: > > Please see below: > > Why are these two approaches giving very different p-values? And which one > shall I use? > > Thanks a lot! > > 1. treating the hours as double/floating numbers: > > > anova(lm(hourlydata~as.double(hours_factors))) > > Df Sum Sq Mean Sq F value Pr(>F) > > as.double(hours_factors) 1 0.0002 0.00019876 1.3425 0.2466 > > Residuals 14868 2.2013 0.00014806 > > 2. treating the hours as factors: > > > > anova(lm(hourlydata~hours_factors)) > > Df Sum Sq Mean Sq F value Pr(>F) > > hours_factors 9 0.00077 8.5979e-05 0.5806 0.8142 > > Residuals 14860 2.20072 1.4810e-04 > > ? ? ? ?[[alternative HTML version deleted]] >-- Sarah Goslee http://www.functionaldiversity.org
On Dec 8, 2011, at 3:28 PM, Michael wrote:> Hi all, > > If we wanted to study the effect on the mean of the hourly data > based on > the hours within a day... > > and we wanted to do Anova analysis... > > We have two choices:Who is "we" and how were these constraints imposed?> > Please see below: > > Why are these two approaches giving very different p-values?They are markedly different statistical models.> And which one > shall I use? >Without knowing your situation better and the eventual purposes of this analysis, it would be difficult to give sensible advice. I suspect the answer is "neither". -- David.> Thanks a lot! > > 1. treating the hours as double/floating numbers: > > > anova(lm(hourlydata~as.double(hours_factors))) > > Df Sum Sq Mean Sq F value Pr(>F) > > as.double(hours_factors) 1 0.0002 0.00019876 1.3425 0.2466 > > Residuals 14868 2.2013 0.00014806 > > 2. treating the hours as factors: > > > > anova(lm(hourlydata~hours_factors)) > > Df Sum Sq Mean Sq F value Pr(>F) > > hours_factors 9 0.00077 8.5979e-05 0.5806 0.8142 > > Residuals 14860 2.20072 1.4810e-04 > > [[alternative HTML version deleted]]David Winsemius, MD West Hartford, CT