Sorry i had a misprint in the appendix code in the last email Hi I needed some help with ANOVA I have a problem with My ANOVA analysis. I have a dataset with a known ANOVA p-value, however I can not seem to re-create it in R. I have created a list (zzzanova) which contains 1)Intensity Values 2)Group Number (6 Different Groups) 3)Sample Number (54 different samples) this is created by the script in Appendix 1 I then conduct ANOVA with the command> zzz.aov <- aov(Intensity ~ Group, data = zzzanova)I get a p-value of Pr(>F)1 0.9483218 The expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly or have put in a wrong formula. I am trying to do an ANOVA analysis across all 6 Groups. Is there something wrong with my formula. But I think I have made a mistake in the formula rather than anything else. APPENDIX 1 datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 3.003749, -4.60517, 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, -4.60517, -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, -4.60517, 2.39127, 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, -4.60517, 2.121427, 1.973118, -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, 0.023703, -4.60517, 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 1.371895, 1.533227) "zzzanova" <- structure(list(Intensity = datalist, Group = structure(c(1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,3, 4,4,4,4,4,4,4,4,4,4, 5,5,5,5,5,5,5,5,5, 6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", "Group4", "Group5", "Group6"), class = "factor"), Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) )) , .Names = c("Intensity", "Group", "Sample"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54"),class = "data.frame")
Joshua Wiley
2010-Jul-06 15:46 UTC
[R] Help With ANOVA (corrected please ignore last email)
Hi Amit, When I copy in your data and run aov(Intensity ~ Group, data = zzzanova) I get neither the p-value you showed nor the one you expected. My suggestions at things to look at would be 1) Where/How did you get the expected p-value? Another statistics program (e.g., SPSS or SAS)? It helps to know where the expected value came from. If it was from another program, I would also recommend searching the R-help archives (for instance for SPSS and ANOVA) 2) Perhaps something is happening with the unbalanced design (e.g., aov() in R handles it differently than you expected)? 3) You only mention that the p-values do not match. What about other aspects? Are the df the same? I would try to be certain that the data in R is the same as what was used to calculate the expected p-value. summary(zzzanova) will return some nice summary statistics for each column of your dataframe. Cheers, Josh On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrhelp at yahoo.co.uk> wrote:> Sorry i had a misprint in the appendix code in the last email > > > Hi I needed some help with ANOVA > > I have a problem with My ANOVA > analysis. I have a dataset with a known ANOVA p-value, however I can > not seem to re-create it in R. > > I have created a list (zzzanova) which contains > 1)Intensity Values > 2)Group Number (6 Different Groups) > 3)Sample Number (54 different samples) > this is created by the script in Appendix 1 > > I then conduct ANOVA with the command >> zzz.aov <- aov(Intensity ~ Group, data = zzzanova) > > I get a p-value of > Pr(>F)1 > 0.9483218 > > The > expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly > or have put in a wrong formula. I am trying to do an ANOVA analysis > across all 6 Groups. Is there something wrong with my formula. But I think I > have made a mistake in the formula rather than anything else. > > > > > APPENDIX 1 > > datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 3.003749, -4.60517, > ? ?2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, -4.60517, > ? ?-4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, -4.60517, 2.39127, > ? ?2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, -4.60517, 2.121427, 1.973118, > ? ?-4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, ?0.023703, -4.60517, > ? ?2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 1.371895, 1.533227) > > "zzzanova" <- > structure(list(Intensity = datalist, > Group = structure(c(1,1,1,1,1,1,1,1,1, > ? ? ? ? 2,2,2,2,2,2,2,2, > ? ? ? ? 3,3,3,3,3,3,3,3,3, > ? ? ? ? 4,4,4,4,4,4,4,4,4,4, > ? ? ? ? 5,5,5,5,5,5,5,5,5, > ? ? ? ? 6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", "Group4", "Group5", "Group6"), class = "factor"), > ? ?Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, > ? ?10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > ? ?20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > ? ?30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) > )) > , .Names = c("Intensity", > "Group", "Sample"), row.names > c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", > "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", > "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", > "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", > "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", > "51", "52", "53", "54"),class = "data.frame") > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Joshua Wiley
2010-Jul-06 18:18 UTC
[R] Help With ANOVA (corrected please ignore last email)
Hello, Are you saying that the -4.60517 values are supposed to be treated as missing? Unless you set them to NA in R, they will be treated as real values. This would make a huge difference. I can tell you that your formula: aov(Intensity ~ Group, data zzzanova) is treating the variable 'Intensity' as the outcome (or dependent variable) and the variable 'Group' as the predictor (or independent variable). But of course, whether you are using an ANOVA correctly depends on a lot more. Oh, and yes, Pr(>F) is the p-value. However, I may have found the source of your discrepancy. Looking at the data you provided:> str(zzzanova)'data.frame': 54 obs. of 3 variables: $ Intensity: num -4.61 -4.61 -4.61 -4.61 -4.61 ... $ Group : Factor w/ 6 levels "Group1","Group2",..: 1 1 1 1 1 1 1 1 1 2 ... $ Sample : num 1 2 3 4 5 6 7 8 9 10 ... Notice that 'Group' is a factor with 6 levels. That means that R is not treating this variable's scale as interval or ratio variable. In fact, it is being treated as nominal. Looking at the summary of the ANOVA:> summary(aov(Intensity ~ Group, data = zzzanova))Df Sum Sq Mean Sq F value Pr(>F) Group 5 98.85 19.771 2.1469 0.07576 . Residuals 48 442.03 9.209 --- indicates that the variable 'Group' has 5 degrees of freedom, which makes sense given that it has 6 levels. Now look what happens when I convert the variable 'Group' from a factor to a numeric variable (using as.numeric() on the variable in the formula).> summary(aov(Intensity ~ as.numeric(zzzanova$Group), data = zzzanova))Df Sum Sq Mean Sq F value Pr(>F) as.numeric(zzzanova$Group) 1 80.18 80.182 9.0503 0.004042 ** Residuals 52 460.70 8.860 Now Group only has 1 degree of freedom (because it is being treated as a single predictor now instead of having multiple levels). Also the new p-value seems within rounding error of what you expected. All that said, my guess is that you really do want to treat the variable 'Group' as a factor. Treated as a numeric variable what you are basically testing is the relationship between Group and Intensity as group increases from 1 to 6. This is very different from testing the relationship between 6 different groups and Intensity. As a disclaimer, that is not really the best explanation of what is going on, but every other way I tried to say it seemed worse, so if you are unclear you might consult a statistics text or perhaps others can give a more erudite explanation. So to summarize, take a look at the original data for Group that you pulled into R. If you do not want it to be a factor you should either convert it somehow, or read the data in differently. This may not be the actual issue, but it was the closest I could come to replicating your p-value. Best regards, Josh On Tue, Jul 6, 2010 at 10:09 AM, Amit Patel <amitrhelp at yahoo.co.uk> wrote:> Hi Joshua > > Thanks very much for your help. I will take your advice and work on my problem. I am getting my expected p-values from GeneSpring software. I thought the problem lied in the treatment of the NA (-4.60517) values and maybe how they are treated in the analysis (i.e ignored or -4.60517). Im not sure what your ANOVA background is but I just wanted to check that I'm using ANOVA correctly. Im using the assumption that the Pr(>F)1 value is the actual p-value. > > Thanks again for your help > > > > ----- Original Message ---- > From: Joshua Wiley <jwiley.psych at gmail.com> > To: Amit Patel <amitrhelp at yahoo.co.uk> > Cc: r-help at r-project.org > Sent: Tue, 6 July, 2010 16:46:30 > Subject: Re: [R] Help With ANOVA (corrected please ignore last email) > > Hi Amit, > > When I copy in your data and run > > aov(Intensity ~ Group, data = zzzanova) > > I get neither the p-value you showed nor the one you expected. ?My > suggestions at things to look at would be > > 1) Where/How did you get the expected p-value? ?Another statistics > program (e.g., SPSS or SAS)? ?It helps to know where the expected > value came from. ?If it was from another program, I would also > recommend searching the R-help archives (for instance for SPSS and > ANOVA) > > 2) Perhaps something is happening with the unbalanced design (e.g., > aov() in R handles it differently than you expected)? > > 3) You only mention that the p-values do not match. ?What about other > aspects? ?Are the df the same? ?I would try to be certain that the > data in R is the same as what was used to calculate the expected > p-value. ?summary(zzzanova) will return some nice summary statistics > for each column of your dataframe. > > Cheers, > > > Josh > > > > On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrhelp at yahoo.co.uk> wrote: >> Sorry i had a misprint in the appendix code in the last email >> >> >> Hi I needed some help with ANOVA >> >> I have a problem with My ANOVA >> analysis. I have a dataset with a known ANOVA p-value, however I can >> not seem to re-create it in R. >> >> I have created a list (zzzanova) which contains >> 1)Intensity Values >> 2)Group Number (6 Different Groups) >> 3)Sample Number (54 different samples) >> this is created by the script in Appendix 1 >> >> I then conduct ANOVA with the command >>> zzz.aov <- aov(Intensity ~ Group, data = zzzanova) >> >> I get a p-value of >> Pr(>F)1 >> 0.9483218 >> >> The >> expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly >> or have put in a wrong formula. I am trying to do an ANOVA analysis >> across all 6 Groups. Is there something wrong with my formula. But I think I >> have made a mistake in the formula rather than anything else. >> >> >> >> >> APPENDIX 1 >> >> datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 3.003749, -4.60517, >> ? ?2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, -4.60517, >> ? ?-4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, -4.60517, 2.39127, >> ? ?2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, -4.60517, 2.121427, 1.973118, >> ? ?-4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, ?0.023703, -4.60517, >> ? ?2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 1.371895, 1.533227) >> >> "zzzanova" <- >> structure(list(Intensity = datalist, >> Group = structure(c(1,1,1,1,1,1,1,1,1, >> ? ? ? ? 2,2,2,2,2,2,2,2, >> ? ? ? ? 3,3,3,3,3,3,3,3,3, >> ? ? ? ? 4,4,4,4,4,4,4,4,4,4, >> ? ? ? ? 5,5,5,5,5,5,5,5,5, >> ? ? ? ? 6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", "Group4", "Group5", "Group6"), class = "factor"), >> ? ?Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, >> ? ?10, 11, 12, 13, 14, 15, 16, 17, 18, 19, >> ? ?20, 21, 22, 23, 24, 25, 26, 27, 28, 29, >> ? ?30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) >> )) >> , .Names = c("Intensity", >> "Group", "Sample"), row.names >> c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", >> "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", >> "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", >> "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", >> "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", >> "51", "52", "53", "54"),class = "data.frame") >> >> >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > > > > >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Dennis Murphy
2010-Jul-07 11:38 UTC
[R] Help With ANOVA (corrected please ignore last email)
Hi: I'd suggest looking at the following plot (data in original post, copied below): library(lattice) stripplot(Intensity ~ Group, data = zzzanova) Some things stand out in this plot that merit attention. As Josh Wiley pointed out in an earlier reply, the concentration of -4.60517 values in this data set should be a concern. Of your 54 responses, 26 of them have this value, spread across all six groups - in addition, this value is concentrated in the first few groups and becomes rarer in the later groups. Consider:> with(zzzanova, table(Group))Group Group1 Group2 Group3 Group4 Group5 Group6 9 8 9 10 9 9> with(subset(zzzanova, Intensity < -1), table(Intensity, Group))Group Intensity Group1 Group2 Group3 Group4 Group5 Group6 -4.60517 8 5 4 4 4 1 Also note that there is one other value with a negative intensity in group 6 (-0.024). (Side note: Is negative intensity meaningful in the context of your scientific problem?) I'm curious about what this -4.60 value is supposed to represent. Is it a missing value code, as Josh inferred, a left-censoring value, or something else? The reason for asking is that its purpose could well have an impact on the type of analysis that is appropriate for these data: * if -4.60 is meant to be a missing value code, its inclusion greatly inflates the degrees of freedom actually present in the data. Moreover, its presence also inflates the variability both within and between groups, which reduces the sensitivity of tests. Also observe from the strip plot above that if -4.60 is a missing value code, then your non-missing data appear to increase in variance with increasing group numbers; moreover, the imbalance in observations between groups is more severe, which in turn makes the p-values of tests less reliable for the non-missing data. (Even worse, the imbalance and increasing variance don't appear to be independent if -4.60 represents a missing value.) * if -4.60 is meant to be something like a lower detection limit or upper bound on a left-censored response, then one is artificially reducing variability within groups and the p-values of tests turn out to be optimistic. There are better ways to handle nondetects. One approach is outlined in Helsel's book 'Nondetects and Data Analysis', which is the reference of the R package NADA. Other approaches to nondetect data are also available outside of R (e.g., the SCOUT software at EPA). * if -4.60 is meant to represent a lower bound on a (censored?) response, then setting -4.60 as the response artificially increases the variation in the data. In this case, one would be artificially left-truncating the data, but for a different reason from that given in the immediately preceding point. Josh also noted the difference in p-values of the between group test when the groups were factors as opposed to numeric. This is a common 'gotcha' in R - you need to pay attention to the classes of your inputs when fitting any statistical model. In the case where the groups comprise a factor, R performs a one-way ANOVA. [FWIW, I got the same p-value as Josh for your 'complete' data (54 obs.)]. When the group variable is numeric, you are fitting a simple linear regression analysis, which implies that the numeric values of the groups are meaningful. There is a big difference in the interpretation of the two types of models. HTH, Dennis On Tue, Jul 6, 2010 at 6:12 AM, Amit Patel <amitrhelp@yahoo.co.uk> wrote:> Sorry i had a misprint in the appendix code in the last email > > > Hi I needed some help with ANOVA > > I have a problem with My ANOVA > analysis. I have a dataset with a known ANOVA p-value, however I can > not seem to re-create it in R. > > I have created a list (zzzanova) which contains > 1)Intensity Values > 2)Group Number (6 Different Groups) > 3)Sample Number (54 different samples) > this is created by the script in Appendix 1 > > I then conduct ANOVA with the command > > zzz.aov <- aov(Intensity ~ Group, data = zzzanova) > > I get a p-value of > Pr(>F)1 > 0.9483218 > > The > expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly > or have put in a wrong formula. I am trying to do an ANOVA analysis > across all 6 Groups. Is there something wrong with my formula. But I think > I > have made a mistake in the formula rather than anything else. > > > > > APPENDIX 1 > > datalist <- c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, > -4.60517, 3.003749, -4.60517, > 2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, > -4.60517, > -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, > -4.60517, 2.39127, > 2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, > -4.60517, 2.121427, 1.973118, > -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517, > 0.023703, -4.60517, > 2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, > 1.371895, 1.533227) > > "zzzanova" <- > structure(list(Intensity = datalist, > Group = structure(c(1,1,1,1,1,1,1,1,1, > 2,2,2,2,2,2,2,2, > 3,3,3,3,3,3,3,3,3, > 4,4,4,4,4,4,4,4,4,4, > 5,5,5,5,5,5,5,5,5, > 6,6,6,6,6,6,6,6,6), .Label = c("Group1", "Group2", "Group3", > "Group4", "Group5", "Group6"), class = "factor"), > Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9, > 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, > 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54) > )) > , .Names = c("Intensity", > "Group", "Sample"), row.names > c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", > "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", > "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", > "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", > "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", > "51", "52", "53", "54"),class = "data.frame") > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]