Dear R users, I have the following Question related to Package lmPerm: This package uses a modified version of aov() function, which uses Permutation Tests instead of Normal Theory Tests for fitting an Analysis of Variance (ANOVA) Model. However, when I run the following code for a simple linear model: library(lmPerm) e$t_Downtime_per_Intervention_Successful %>% aovp( formula = `Downtime per Intervention[h]` ~ `Working Hours`, data = . ) %>% summary() I obtain different p-values for each run! With a regular ANOVA Test, I obtain instead a constant F-statistic, but I do not fulfill the required Normality Assumptions. So my questions are: Would it still be possible use the regular aov() by generating permutations in advance (Obtaining therefore a Normal Distribution thanks to the Central Limit Theorem)? And applying the aov() function afterwards? Does it have sense? Or maybe this issue could be due to unbalanced classes? I also tried to weight observations based on proportions, but the function failed. Any alternative solution for performing a One-Way ANOVA Test over Non-Normal Data? Thank you. Juan [[alternative HTML version deleted]]
Dear Juan I do not use the package but if it does permutation tests it presumably uses random numbers and since you are not setting the seed you would get different values for each run. Michael On 03/09/2018 16:17, Juan Telleria Ruiz de Aguirre wrote:> Dear R users, > > I have the following Question related to Package lmPerm: > > This package uses a modified version of aov() function, which uses > Permutation Tests instead of Normal Theory Tests for fitting an Analysis of > Variance (ANOVA) Model. > > However, when I run the following code for a simple linear model: > > library(lmPerm) > > e$t_Downtime_per_Intervention_Successful %>% > aovp( > formula = `Downtime per Intervention[h]` ~ `Working Hours`, > data = . > ) %>% > summary() > > I obtain different p-values for each run! > > With a regular ANOVA Test, I obtain instead a constant F-statistic, but I > do not fulfill the required Normality Assumptions. > > So my questions are: > > Would it still be possible use the regular aov() by generating permutations > in advance (Obtaining therefore a Normal Distribution thanks to the Central > Limit Theorem)? And applying the aov() function afterwards? Does it have > sense? > > > Or maybe this issue could be due to unbalanced classes? I also tried to > weight observations based on proportions, but the function failed. > > > Any alternative solution for performing a One-Way ANOVA Test over > Non-Normal Data? > > > Thank you. > > Juan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html
Juan, Your question might be borderline for this list, as it ultimately rather seems a stats question coming in R disguise. Anyway, the short answer is that you *expect* to get a different p value from a permutation test unless you are able to do all possible permutation and therefore use the so-called systematic reference set. That is rarely the case, and only for relatively small problems. The permutation test uses a random subset of all possible permutations. Given this randomness, you'll get a different p value. In order to get reproducible results, you may specify a seed (?set.seed), yet that is only reproducible with this environment. Someone else with a different software and/or code might come out with a different p. The higher the number of permutations used, the smaller the variation around the p values, however. For most applications, 1000 seem good enough to me, but sometimes I go higher (in particular if the p value is borderline and I really need a strict above/below alpha decision). The permutations do not create an implicit normal distribution, but rather a null distribution that can (likely is depending on non-normality of your data) not normal. So your respective proposal does not appeal. I don't think you need an alternative - the permutation test is just fine, and recognizing the randomness in the execution does not render the (relatively small) variability in p values a major issue. You may want to have a look at the text book by Edgington & Onghena for details on permutation tests, and there are plenty of papers out there addressing them in various contexts, which will help to understand *why* you observe what you observe here. HTH, Michael> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Juan Telleria Ruiz > de Aguirre > Sent: Montag, 3. September 2018 17:18 > To: R help Mailing list <r-help at r-project.org> > Subject: [R] ANOVA Permutation Test > > Dear R users, > > I have the following Question related to Package lmPerm: > > This package uses a modified version of aov() function, which uses > Permutation Tests instead of Normal Theory Tests for fitting an Analysis of > Variance (ANOVA) Model. > > However, when I run the following code for a simple linear model: > > library(lmPerm) > > e$t_Downtime_per_Intervention_Successful %>% > aovp( > formula = `Downtime per Intervention[h]` ~ `Working Hours`, > data = . > ) %>% > summary() > > I obtain different p-values for each run! > > With a regular ANOVA Test, I obtain instead a constant F-statistic, but I do not > fulfill the required Normality Assumptions. > > So my questions are: > > Would it still be possible use the regular aov() by generating permutations in > advance (Obtaining therefore a Normal Distribution thanks to the Central > Limit Theorem)? And applying the aov() function afterwards? Does it have > sense? > > > Or maybe this issue could be due to unbalanced classes? I also tried to weight > observations based on proportions, but the function failed. > > > Any alternative solution for performing a One-Way ANOVA Test over Non- > Normal Data? > > > Thank you. > > Juan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
> This package uses a modified version of aov() function, which uses > Permutation Tests > > I obtain different p-values for each run!Could that be because you are defaulting to perm="Prob"? I am not familiar with the package, but the manual is informative. You may have missed something when reading it. " ...The Exact method will be used by default when the number of observations is less than or equal to maxExact, otherwise Prob will be used. Prob: Iterations terminate when the estimated standard error of the estimated proportion p is less than p*Ca" I would assume that probabilistic permutation is random and will change from run to run. You could use set.seed() to stop that, but it's actually quite useful to see how much the results change. If you want complete permutation, you'd need to force Exact (unless that does not mean what it sounds like for this package). It looks like that requires you to set maxExact to at least your number of observations. But given that permutation grows combinatorially, that could take a _long_ time for a run; the Example in the help page does not complete in a useful time when maxExact is set to exceed the number of data points. So I'd probably run it using Prob and simply note the range of results for a handful of runs to give you an indication of how far to trust the answers.> Would it still be possible use the regular aov() by generating permutations > in advance (Obtaining therefore a Normal Distribution thanks to the Central > Limit Theorem)? And applying the aov() function afterwards? Does it have > sense?As a chemist, I'd guess No. And you'd be even more limited in number of permutations.> Or maybe this issue could be due to unbalanced classes? I also tried to > weight observations based on proportions, but the function failed.No, it's nothing to do with balance, if the results change run to run with no change in the model. I'd guess that may exacerbate the permutaiton variability somewhat but it won't _cause_ it.> Any alternative solution for performing a One-Way ANOVA Test over > Non-Normal Data?Yes; the traditional nonparametric test for one-way data (balanced) is the kruskal-wallis test - see ?kruskal.test. Classical ANOVA on ranks can also be defended as a general 'nonparametric' approach, though I gather it can also be criticised. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
Thank you all for your **very good** answers: Using aovp(..., perm="Exact") seems to be the way to go for small datasets, and also I should definitely try ?kruskal.test. Juan [[alternative HTML version deleted]]