Kristen Ross
2013-Dec-17 04:56 UTC
[R] What is the formula of Pseudo-F statistic in capscale in vegan?
Dear R-help, We are conducting a distance-based redundancy analysis using capscale and then testing for statistical significance for six terms in the model for the constrained ordination using anova.cca in the vegan package. The significance test is sequential, i.e., testing for significance of a term only after accounting for all preceding terms. Could someone please provide us with either the actual formula for the pseudo-F statistic or a reference that provides this formula? We ask because we are doing the exact same analysis in another program (using distLM in PRIMER 6 v 6.1.13 and PERMANOVA+ v 1.0.3 from PRIMER-E), but we are getting very different pseudo-F ratios despite specifying the exact same order of model terms, using the same Bray-Curtis distance measure (distance matrices produced by the two programs are the same), and using the same sequential test for significance. Below is a table displaying the order of the model terms and the pseudo-F values computed by R and by PRIMER (we also ran the same analysis in CANOCO which showed the same results in pseudo-F values as PRIMER). We have not been able to figure out why we get very different pseudo-F values, leading us to believe that R calculates pseudo-F values differently than PRIMER for the sequential tests. Furthermore, constrained ordination outputs, i.e., eigenvalues, proportion of variability explained by each constrained axes, etc. appear to be identical between the two programs. Below we provide (1) the table showing the different pseudo-F values; (2) the formula used by PRIMER to calculate pseudo-F values that we think is the same being used in R, but need confirmation; and (3) the R code used for this analysis. (1) Table Model Social Variables PRIMER pseudo-F R pseudo-F SEQUENTIAL TESTS GroupSize 1.1904 1.5528 Board 1.5079 1.8872 MtgStyle 1.1326 1.4007 DmStyle 1.0971 1.3437 DifView 1.4892 1.7299 VolAuton 2.2923 2.2925 (2) pseudo-F formula We know that PRIMER uses the following formula to calculate the pseudo-F for a sequential test of significance (equation 4.3, Anderson, Gorley, and Clarke 2008, Chapter 4. Pg. 129, and based on pseudo-F equation in Legendre and Anderson (1999), Ecological Monographs vol. 69): F= (SSFull - SSReduced)/(qFull-qReduced) (SSTotal-SSFull)/(N - qFull - 1) (3) R code ## creating Bray-Curtis of Biodiversity data H.BC <- vegdist(H.Full [,14:211], "bray") ## Distance based redundancy analysis (dbRDA) m1<-capscale(H.BC ~ GroupSize + Board + MtgStyle + DmStyle + DifView + VolAuton, SScomp [,14:19], distance = "euclidean", add = TRUE) ### NOTE: pseudo-F values are the same with or without correcting for negative eigenvalues (although they are different from other programs). ## Sequential test for terms anova.cca(m1, by="terms", perm.max=1000, permu=500) Thank you very much for any help or insights that anyone can provide, Kristen Kristen A. Ross, PhD Post Doctoral Researcher University of Illinois at Chicago kristenrossUIC@gmail.com and Research Associate Green Mountain College One Brennan Circle Poultney, VT 05764 [[alternative HTML version deleted]]
Jari Oksanen
2013-Dec-17 09:10 UTC
[R] What is the formula of Pseudo-F statistic in capscale in vegan?
Dear Kristen Ross, Kristen Ross <guayabitogirl <at> gmail.com> writes:>tion; and (3) the R code used for this analysis.Sorry that I have to remove most of your original message: gmane won't allow me to post if I add too little compared to the cited text. This is now wild guessing, since there is nothing I would be able to reproduce, in particular as I cannot afford buying PRIMER licence and cannot even see its manual. I have one guess, though. Look at the first and last item of your table:> > (1) Table > > > > PRIMER pseudo-F > > R pseudo-F > > SEQUENTIAL TESTS > > GroupSize > > 1.1904 > > 1.5528 >...> VolAuton > > 2.2923 > > 2.2925 >I am not quite sure how to read this table, but I *assume* that one of the numbers comes PRIMER and one from vegan:::capscale (this whole answer is based on that assumption). As you see, the first two numbers are very different, and the last two numerically fairly equal. I cannot decipher the PRIMER formula for pseudo-F you give below, but I guess that PRIMER changes the denominator of the pseudo-F at every step. I guess it uses residual SS and residual df after the current term, and that would include variation explained by later variables in the sequential tests as well as their df's. In vegan we always use the same denominator in all cases with the same degrees of freedom. This denominator, or scale, is the residual variation and residual df's after all explanatory variables (constraints). The vegan way is similar to the one you get from ordinary anova of lm(). We refuse (and have refused earlier) to implement anything else. However, using add1(<capscale-result>, ..., test = "perm") can give you tests that pretend that later variables in the sequence are not in the model.> (2) pseudo-F formula > > We know that PRIMER uses the following formula to calculate the pseudo-F for > a sequential test of significance (equation 4.3, Anderson, Gorley, and > Clarke 2008, Chapter 4. Pg. 129, and based on pseudo-F equation in Legendre > and Anderson (1999), Ecological Monographs vol. 69): > > F= (SSFull - SSReduced)/(qFull-qReduced) > > (SSTotal-SSFull)/(N - qFull - 1) > > (3) R code > > ## creating Bray-Curtis of Biodiversity data > > H.BC <- vegdist(H.Full [,14:211], "bray") > > ## Distance based redundancy analysis (dbRDA) > > m1<-capscale(H.BC ~ GroupSize + Board + MtgStyle + DmStyle + DifView + > VolAuton, SScomp [,14:19], distance = "euclidean", add = TRUE) > > ### NOTE: pseudo-F values are the same with or without correcting for > negative eigenvalues (although they are different from other programs). >If you here claim that the pseudo-F values are the same in vegan:::capscale with add=FALSE and add=TRUE, I claim that you are wrong, or that you made an error. One source of error may be that in the example above you set 'distance = "euclidean"' in which case 'add' argument has no effect since you have no negative eigenvalues with Euclidean distances. However, the example above should *still* give you negative eigenvalues and change in pseudo-F, because you did two contradictory things: you supplied non-Euclidean dissimilarities in input, and you asked for Euclidean distances in the command. In this case, the 'distance' argument will be silently ignored and input dissimilarities will be used. Please check this. I have no idea what you mean with "correcting" for negative eigenvalues. You can have transformation that removes them, but I cannot see how that would be a "correction". Cheers, Jari Oksanen