thr3ads.net - R help - [R] Help With ANOVA (corrected please ignore last email) [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Amit Patel

2010-Jul-06 13:12 UTC

[R] Help With ANOVA (corrected please ignore last email)

Sorry i had a misprint in the appendix code in the last email


Hi I needed some help with ANOVA

I have a problem with My ANOVA
analysis. I have a dataset with a known ANOVA p-value, however I can
not seem to re-create it in R.

I have created a list (zzzanova) which contains
1)Intensity Values
2)Group Number (6 Different Groups)
3)Sample Number (54 different samples)
this is created by the script in Appendix 1

I then conduct ANOVA with the command> zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
I get a p-value of
Pr(>F)1 
0.9483218 

The
expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
or have put in a wrong formula. I am trying to do an ANOVA analysis
across all 6 Groups. Is there something wrong with my formula. But I think I
have made a mistake in the formula rather than anything else.




APPENDIX 1

datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517,
-4.60517, 3.003749, -4.60517,
    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743,
-4.60517,
    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075,
-4.60517, 2.39127,
    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162,
-4.60517, 2.121427, 1.973118,
    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, 
0.023703, -4.60517,
    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517,
1.371895, 1.533227)

"zzzanova" <-
structure(list(Intensity = datalist, 
Group = structure(c(1,1,1,1,1,1,1,1,1,
         2,2,2,2,2,2,2,2,
         3,3,3,3,3,3,3,3,3,
         4,4,4,4,4,4,4,4,4,4,
         5,5,5,5,5,5,5,5,5,
         6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2",
"Group3", "Group4", "Group5", "Group6"),
class = "factor"),
    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 
    10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
    20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 
    30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
))
, .Names = c("Intensity", 
"Group", "Sample"), row.names = 
c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10",
"11", "12", "13", "14", "15",
"16", "17", "18", "19", "20",
"21", "22", "23", "24", "25",
"26", "27", "28", "29", "30",
"31", "32", "33", "34", "35",
"36", "37", "38", "39", "40",
"41", "42", "43", "44", "45",
"46", "47", "48", "49", "50",
"51", "52", "53", "54"),class =
"data.frame")

Joshua Wiley

2010-Jul-06 15:46 UTC

head link

[R] Help With ANOVA (corrected please ignore last email)

Hi Amit,

When I copy in your data and run

aov(Intensity ~ Group, data = zzzanova)

I get neither the p-value you showed nor the one you expected.  My
suggestions at things to look at would be

1) Where/How did you get the expected p-value?  Another statistics
program (e.g., SPSS or SAS)?  It helps to know where the expected
value came from.  If it was from another program, I would also
recommend searching the R-help archives (for instance for SPSS and
ANOVA)

2) Perhaps something is happening with the unbalanced design (e.g.,
aov() in R handles it differently than you expected)?

3) You only mention that the p-values do not match.  What about other
aspects?  Are the df the same?  I would try to be certain that the
data in R is the same as what was used to calculate the expected
p-value.  summary(zzzanova) will return some nice summary statistics
for each column of your dataframe.

Cheers,


Josh



On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrhelp at yahoo.co.uk>
wrote:> Sorry i had a misprint in the appendix code in the last email
>
>
> Hi I needed some help with ANOVA
>
> I have a problem with My ANOVA
> analysis. I have a dataset with a known ANOVA p-value, however I can
> not seem to re-create it in R.
>
> I have created a list (zzzanova) which contains
> 1)Intensity Values
> 2)Group Number (6 Different Groups)
> 3)Sample Number (54 different samples)
> this is created by the script in Appendix 1
>
> I then conduct ANOVA with the command
>> zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
>
> I get a p-value of
> Pr(>F)1
> 0.9483218
>
> The
> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
> or have put in a wrong formula. I am trying to do an ANOVA analysis
> across all 6 Groups. Is there something wrong with my formula. But I think
I
> have made a mistake in the formula rather than anything else.
>
>
>
>
> APPENDIX 1
>
> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517,
-4.60517, -4.60517, 3.003749, -4.60517,
> ? ?2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743,
-4.60517,
> ? ?-4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075,
-4.60517, 2.39127,
> ? ?2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162,
-4.60517, 2.121427, 1.973118,
> ? ?-4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,
?0.023703, -4.60517,
> ? ?2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517,
1.371895, 1.533227)
>
> "zzzanova" <-
> structure(list(Intensity = datalist,
> Group = structure(c(1,1,1,1,1,1,1,1,1,
> ? ? ? ? 2,2,2,2,2,2,2,2,
> ? ? ? ? 3,3,3,3,3,3,3,3,3,
> ? ? ? ? 4,4,4,4,4,4,4,4,4,4,
> ? ? ? ? 5,5,5,5,5,5,5,5,5,
> ? ? ? ? 6,6,6,6,6,6,6,6,6), .Label = c("Group1",
"Group2", "Group3", "Group4", "Group5",
"Group6"), class = "factor"),
> ? ?Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
> ? ?10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
> ? ?20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
> ? ?30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
> ))
> , .Names = c("Intensity",
> "Group", "Sample"), row.names > c("1",
"2", "3", "4", "5", "6",
"7", "8", "9", "10",
> "11", "12", "13", "14",
"15", "16", "17", "18", "19",
"20",
> "21", "22", "23", "24",
"25", "26", "27", "28", "29",
"30",
> "31", "32", "33", "34",
"35", "36", "37", "38", "39",
"40",
> "41", "42", "43", "44",
"45", "46", "47", "48", "49",
"50",
> "51", "52", "53", "54"),class =
"data.frame")
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Joshua Wiley

2010-Jul-06 18:18 UTC

head link

[R] Help With ANOVA (corrected please ignore last email)

Hello,

Are you saying that the -4.60517 values are supposed to be treated as
missing?  Unless you set them to NA in R, they will be treated as real
values.  This would make a huge difference.

I can tell you that your formula: aov(Intensity ~ Group, data zzzanova) is
treating the variable 'Intensity' as the outcome (or
dependent variable) and the variable 'Group' as the predictor (or
independent variable).  But of course, whether you are using an ANOVA
correctly depends on a lot more.  Oh, and yes, Pr(>F) is the p-value.

However, I may have found the source of your discrepancy.  Looking at
the data you provided:
> str(zzzanova)'data.frame':	54 obs. of  3 variables:
 $ Intensity: num  -4.61 -4.61 -4.61 -4.61 -4.61 ...
 $ Group    : Factor w/ 6 levels "Group1","Group2",..: 1 1 1
1 1 1 1 1 1 2 ...
 $ Sample   : num  1 2 3 4 5 6 7 8 9 10 ...

Notice that 'Group' is a factor with 6 levels.  That means that R is
not treating this variable's scale as interval or ratio variable.  In
fact, it is being treated as nominal.  Looking at the summary of the
ANOVA:
> summary(aov(Intensity ~ Group, data = zzzanova))            Df Sum Sq Mean Sq F value  Pr(>F)
Group        5  98.85  19.771  2.1469 0.07576 .
Residuals   48 442.03   9.209
---

indicates that the variable 'Group' has 5 degrees of freedom, which
makes sense given that it has 6 levels.  Now look what happens when I
convert the variable 'Group' from a factor to a numeric variable
(using as.numeric() on the variable in the formula).
> summary(aov(Intensity ~ as.numeric(zzzanova$Group), data = zzzanova))                           Df Sum Sq Mean Sq F value   Pr(>F)
as.numeric(zzzanova$Group)  1  80.18  80.182  9.0503 0.004042 **
Residuals                  52 460.70   8.860

Now Group only has 1 degree of freedom (because it is being treated as
a single predictor now instead of having multiple levels).  Also the
new p-value seems within rounding error of what you expected.


All that said, my guess is that you really do want to treat the
variable 'Group' as a factor.  Treated as a numeric variable what you
are basically testing is the relationship between Group and Intensity
as group increases from 1 to 6.  This is very different from testing
the relationship between 6 different groups and Intensity.

As a disclaimer, that is not really the best explanation of what is
going on, but every other way I tried to say it seemed worse, so if
you are unclear you might consult a statistics text or perhaps others
can give a more erudite explanation.


So to summarize, take a look at the original data for Group that you
pulled into R.  If you do not want it to be a factor you should either
convert it somehow, or read the data in differently.  This may not be
the actual issue, but it was the closest I could come to replicating
your p-value.

Best regards,

Josh


On Tue, Jul 6, 2010 at 10:09 AM, Amit Patel <amitrhelp at yahoo.co.uk>
wrote:> Hi Joshua
>
> Thanks very much for your help. I will take your advice and work on my
problem. I am getting my expected p-values from GeneSpring software. I thought
the problem lied in the treatment of the NA (-4.60517) values and maybe how they
are treated in the analysis (i.e ignored or -4.60517). Im not sure what your
ANOVA background is but I just wanted to check that I'm using ANOVA
correctly. Im using the assumption that the Pr(>F)1 value is the actual
p-value.
>
> Thanks again for your help
>
>
>
> ----- Original Message ----
> From: Joshua Wiley <jwiley.psych at gmail.com>
> To: Amit Patel <amitrhelp at yahoo.co.uk>
> Cc: r-help at r-project.org
> Sent: Tue, 6 July, 2010 16:46:30
> Subject: Re: [R] Help With ANOVA (corrected please ignore last email)
>
> Hi Amit,
>
> When I copy in your data and run
>
> aov(Intensity ~ Group, data = zzzanova)
>
> I get neither the p-value you showed nor the one you expected. ?My
> suggestions at things to look at would be
>
> 1) Where/How did you get the expected p-value? ?Another statistics
> program (e.g., SPSS or SAS)? ?It helps to know where the expected
> value came from. ?If it was from another program, I would also
> recommend searching the R-help archives (for instance for SPSS and
> ANOVA)
>
> 2) Perhaps something is happening with the unbalanced design (e.g.,
> aov() in R handles it differently than you expected)?
>
> 3) You only mention that the p-values do not match. ?What about other
> aspects? ?Are the df the same? ?I would try to be certain that the
> data in R is the same as what was used to calculate the expected
> p-value. ?summary(zzzanova) will return some nice summary statistics
> for each column of your dataframe.
>
> Cheers,
>
>
> Josh
>
>
>
> On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrhelp at yahoo.co.uk>
wrote:
>> Sorry i had a misprint in the appendix code in the last email
>>
>>
>> Hi I needed some help with ANOVA
>>
>> I have a problem with My ANOVA
>> analysis. I have a dataset with a known ANOVA p-value, however I can
>> not seem to re-create it in R.
>>
>> I have created a list (zzzanova) which contains
>> 1)Intensity Values
>> 2)Group Number (6 Different Groups)
>> 3)Sample Number (54 different samples)
>> this is created by the script in Appendix 1
>>
>> I then conduct ANOVA with the command
>>> zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
>>
>> I get a p-value of
>> Pr(>F)1
>> 0.9483218
>>
>> The
>> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
>> or have put in a wrong formula. I am trying to do an ANOVA analysis
>> across all 6 Groups. Is there something wrong with my formula. But I
think I
>> have made a mistake in the formula rather than anything else.
>>
>>
>>
>>
>> APPENDIX 1
>>
>> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517,
-4.60517, -4.60517, 3.003749, -4.60517,
>> ? ?2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517,
1.592743, -4.60517,
>> ? ?-4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075,
-4.60517, 2.39127,
>> ? ?2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517,
1.846162, -4.60517, 2.121427, 1.973118,
>> ? ?-4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,
?0.023703, -4.60517,
>> ? ?2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428,
-4.60517, 1.371895, 1.533227)
>>
>> "zzzanova" <-
>> structure(list(Intensity = datalist,
>> Group = structure(c(1,1,1,1,1,1,1,1,1,
>> ? ? ? ? 2,2,2,2,2,2,2,2,
>> ? ? ? ? 3,3,3,3,3,3,3,3,3,
>> ? ? ? ? 4,4,4,4,4,4,4,4,4,4,
>> ? ? ? ? 5,5,5,5,5,5,5,5,5,
>> ? ? ? ? 6,6,6,6,6,6,6,6,6), .Label = c("Group1",
"Group2", "Group3", "Group4", "Group5",
"Group6"), class = "factor"),
>> ? ?Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
>> ? ?10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>> ? ?20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
>> ? ?30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
>> ))
>> , .Names = c("Intensity",
>> "Group", "Sample"), row.names >>
c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10",
>> "11", "12", "13", "14",
"15", "16", "17", "18", "19",
"20",
>> "21", "22", "23", "24",
"25", "26", "27", "28", "29",
"30",
>> "31", "32", "33", "34",
"35", "36", "37", "38", "39",
"40",
>> "41", "42", "43", "44",
"45", "46", "47", "48", "49",
"50",
>> "51", "52", "53", "54"),class =
"data.frame")
>>
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
>
>
>
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Dennis Murphy

2010-Jul-07 11:38 UTC

head link

[R] Help With ANOVA (corrected please ignore last email)

Hi:

I'd suggest looking at the following plot (data in original post, copied
below):

library(lattice)
stripplot(Intensity ~ Group, data = zzzanova)

Some things stand out in this plot that merit attention.

As Josh Wiley pointed out in an earlier reply, the concentration of -4.60517
values
in this data set should be a concern. Of your 54 responses, 26 of them have
this value, spread across all six groups - in addition, this value is
concentrated
in the first few groups and becomes rarer in the later groups. Consider:
> with(zzzanova, table(Group))Group
Group1 Group2 Group3 Group4 Group5 Group6
     9      8      9     10      9      9> with(subset(zzzanova, Intensity < -1), table(Intensity, Group))          Group
Intensity  Group1 Group2 Group3 Group4 Group5 Group6
  -4.60517      8      5      4      4      4      1

Also note that there is one other value with a negative intensity in group 6
(-0.024).
(Side note: Is negative intensity meaningful in the context of your
scientific problem?)

I'm curious about what this -4.60 value is supposed to represent. Is it a
missing
value code, as Josh inferred, a left-censoring value, or something else? The
reason
for asking is that its purpose could well have an impact on the type of
analysis
that is appropriate for these data:

    *  if -4.60 is meant to be a missing value code, its inclusion greatly
inflates the
       degrees of freedom actually present in the data. Moreover, its
presence also
       inflates the variability both within and between groups, which
reduces the
       sensitivity of tests. Also observe from the strip plot above that if
-4.60 is a
       missing value code, then your non-missing data appear to increase in
variance
       with increasing group numbers; moreover, the imbalance in
observations between
       groups is more severe, which in turn makes the p-values of tests less
reliable
       for the non-missing data. (Even worse, the imbalance and increasing
variance
       don't appear to be independent if -4.60 represents a missing value.)
    *  if -4.60 is meant to be something like a lower detection limit or
upper bound
       on a left-censored response, then one is artificially reducing
variability within
       groups and the p-values of tests turn out to be optimistic. There are
better ways
       to handle nondetects. One approach is outlined in Helsel's book
'Nondetects and
       Data Analysis', which is the reference of the R package NADA. Other
approaches
       to nondetect data are also available outside of R (e.g., the SCOUT
software at EPA).
   *   if -4.60 is meant to represent a lower bound on a (censored?)
response,
       then setting -4.60 as the response artificially increases the
variation in the
       data. In this case, one would be artificially left-truncating the
data, but for a
       different reason from that given in the immediately preceding point.

Josh also noted the difference in p-values of the between group test when
the groups
were factors as opposed to numeric. This is a common 'gotcha' in R - you
need to pay attention to
the classes of your inputs when fitting any statistical model. In the case
where the
groups comprise a factor, R performs a one-way ANOVA. [FWIW, I got the same
p-value as Josh for your 'complete' data (54 obs.)]. When the group
variable
is numeric,
you are fitting a simple linear regression analysis, which implies that the
numeric
values of the groups are meaningful. There is a big difference in the
interpretation of
the two types of models.

HTH,
Dennis


On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrhelp@yahoo.co.uk> wrote:
> Sorry i had a misprint in the appendix code in the last email
>
>
> Hi I needed some help with ANOVA
>
> I have a problem with My ANOVA
> analysis. I have a dataset with a known ANOVA p-value, however I can
> not seem to re-create it in R.
>
> I have created a list (zzzanova) which contains
> 1)Intensity Values
> 2)Group Number (6 Different Groups)
> 3)Sample Number (54 different samples)
> this is created by the script in Appendix 1
>
> I then conduct ANOVA with the command
> > zzz.aov <- aov(Intensity ~ Group, data = zzzanova)
>
> I get a p-value of
> Pr(>F)1
> 0.9483218
>
> The
> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
> or have put in a wrong formula. I am trying to do an ANOVA analysis
> across all 6 Groups. Is there something wrong with my formula. But I think
> I
> have made a mistake in the formula rather than anything else.
>
>
>
>
> APPENDIX 1
>
> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517,
-4.60517,
> -4.60517, 3.003749, -4.60517,
>    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743,
> -4.60517,
>    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075,
> -4.60517, 2.39127,
>    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162,
> -4.60517, 2.121427, 1.973118,
>    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,
>  0.023703, -4.60517,
>    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517,
> 1.371895, 1.533227)
>
> "zzzanova" <-
> structure(list(Intensity = datalist,
> Group = structure(c(1,1,1,1,1,1,1,1,1,
>         2,2,2,2,2,2,2,2,
>         3,3,3,3,3,3,3,3,3,
>         4,4,4,4,4,4,4,4,4,4,
>         5,5,5,5,5,5,5,5,5,
>         6,6,6,6,6,6,6,6,6), .Label = c("Group1",
"Group2", "Group3",
> "Group4", "Group5", "Group6"), class =
"factor"),
>    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
>    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
>    30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
> 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
> ))
> , .Names = c("Intensity",
> "Group", "Sample"), row.names > c("1",
"2", "3", "4", "5", "6",
"7", "8", "9", "10",
> "11", "12", "13", "14",
"15", "16", "17", "18", "19",
"20",
> "21", "22", "23", "24",
"25", "26", "27", "28", "29",
"30",
> "31", "32", "33", "34",
"35", "36", "37", "38", "39",
"40",
> "41", "42", "43", "44",
"45", "46", "47", "48", "49",
"50",
> "51", "52", "53", "54"),class =
"data.frame")
>
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jul 2010 - Help With ANOVA (corrected please ignore last email)

[R] Help With ANOVA (corrected please ignore last email)

[R] Help With ANOVA (corrected please ignore last email)

[R] Help With ANOVA (corrected please ignore last email)

[R] Help With ANOVA (corrected please ignore last email)

Possibly Parallel Threads