Kristen Ross
2013-Dec-17 04:56 UTC
[R] What is the formula of Pseudo-F statistic in capscale in vegan?
Dear R-help,
We are conducting a distance-based redundancy analysis using capscale and
then testing for statistical significance for six terms in the model for the
constrained ordination using anova.cca in the vegan package. The
significance test is sequential, i.e., testing for significance of a term
only after accounting for all preceding terms. Could someone please provide
us with either the actual formula for the pseudo-F statistic or a reference
that provides this formula?
We ask because we are doing the exact same analysis in another program
(using distLM in PRIMER 6 v 6.1.13 and PERMANOVA+ v 1.0.3 from PRIMER-E),
but we are getting very different pseudo-F ratios despite specifying the
exact same order of model terms, using the same Bray-Curtis distance measure
(distance matrices produced by the two programs are the same), and using the
same sequential test for significance. Below is a table displaying the order
of the model terms and the pseudo-F values computed by R and by PRIMER (we
also ran the same analysis in CANOCO which showed the same results in
pseudo-F values as PRIMER). We have not been able to figure out why we get
very different pseudo-F values, leading us to believe that R calculates
pseudo-F values differently than PRIMER for the sequential tests.
Furthermore, constrained ordination outputs, i.e., eigenvalues, proportion
of variability explained by each constrained axes, etc. appear to be
identical between the two programs. Below we provide (1) the table showing
the different pseudo-F values; (2) the formula used by PRIMER to calculate
pseudo-F values that we think is the same being used in R, but need
confirmation; and (3) the R code used for this analysis.
(1) Table
Model
Social Variables
PRIMER pseudo-F
R pseudo-F
SEQUENTIAL TESTS
GroupSize
1.1904
1.5528
Board
1.5079
1.8872
MtgStyle
1.1326
1.4007
DmStyle
1.0971
1.3437
DifView
1.4892
1.7299
VolAuton
2.2923
2.2925
(2) pseudo-F formula
We know that PRIMER uses the following formula to calculate the pseudo-F for
a sequential test of significance (equation 4.3, Anderson, Gorley, and
Clarke 2008, Chapter 4. Pg. 129, and based on pseudo-F equation in Legendre
and Anderson (1999), Ecological Monographs vol. 69):
F= (SSFull - SSReduced)/(qFull-qReduced)
(SSTotal-SSFull)/(N - qFull - 1)
(3) R code
## creating Bray-Curtis of Biodiversity data
H.BC <- vegdist(H.Full [,14:211], "bray")
## Distance based redundancy analysis (dbRDA)
m1<-capscale(H.BC ~ GroupSize + Board + MtgStyle + DmStyle + DifView +
VolAuton, SScomp [,14:19], distance = "euclidean", add = TRUE)
### NOTE: pseudo-F values are the same with or without correcting for
negative eigenvalues (although they are different from other programs).
## Sequential test for terms
anova.cca(m1, by="terms", perm.max=1000, permu=500)
Thank you very much for any help or insights that anyone can provide,
Kristen
Kristen A. Ross, PhD
Post Doctoral Researcher
University of Illinois at Chicago
kristenrossUIC@gmail.com
and
Research Associate
Green Mountain College
One Brennan Circle
Poultney, VT 05764
[[alternative HTML version deleted]]
Jari Oksanen
2013-Dec-17 09:10 UTC
[R] What is the formula of Pseudo-F statistic in capscale in vegan?
Dear Kristen Ross, Kristen Ross <guayabitogirl <at> gmail.com> writes:>tion; and (3) the R code used for this analysis.Sorry that I have to remove most of your original message: gmane won't allow me to post if I add too little compared to the cited text. This is now wild guessing, since there is nothing I would be able to reproduce, in particular as I cannot afford buying PRIMER licence and cannot even see its manual. I have one guess, though. Look at the first and last item of your table:> > (1) Table > > > > PRIMER pseudo-F > > R pseudo-F > > SEQUENTIAL TESTS > > GroupSize > > 1.1904 > > 1.5528 >...> VolAuton > > 2.2923 > > 2.2925 >I am not quite sure how to read this table, but I *assume* that one of the numbers comes PRIMER and one from vegan:::capscale (this whole answer is based on that assumption). As you see, the first two numbers are very different, and the last two numerically fairly equal. I cannot decipher the PRIMER formula for pseudo-F you give below, but I guess that PRIMER changes the denominator of the pseudo-F at every step. I guess it uses residual SS and residual df after the current term, and that would include variation explained by later variables in the sequential tests as well as their df's. In vegan we always use the same denominator in all cases with the same degrees of freedom. This denominator, or scale, is the residual variation and residual df's after all explanatory variables (constraints). The vegan way is similar to the one you get from ordinary anova of lm(). We refuse (and have refused earlier) to implement anything else. However, using add1(<capscale-result>, ..., test = "perm") can give you tests that pretend that later variables in the sequence are not in the model.> (2) pseudo-F formula > > We know that PRIMER uses the following formula to calculate the pseudo-F for > a sequential test of significance (equation 4.3, Anderson, Gorley, and > Clarke 2008, Chapter 4. Pg. 129, and based on pseudo-F equation in Legendre > and Anderson (1999), Ecological Monographs vol. 69): > > F= (SSFull - SSReduced)/(qFull-qReduced) > > (SSTotal-SSFull)/(N - qFull - 1) > > (3) R code > > ## creating Bray-Curtis of Biodiversity data > > H.BC <- vegdist(H.Full [,14:211], "bray") > > ## Distance based redundancy analysis (dbRDA) > > m1<-capscale(H.BC ~ GroupSize + Board + MtgStyle + DmStyle + DifView + > VolAuton, SScomp [,14:19], distance = "euclidean", add = TRUE) > > ### NOTE: pseudo-F values are the same with or without correcting for > negative eigenvalues (although they are different from other programs). >If you here claim that the pseudo-F values are the same in vegan:::capscale with add=FALSE and add=TRUE, I claim that you are wrong, or that you made an error. One source of error may be that in the example above you set 'distance = "euclidean"' in which case 'add' argument has no effect since you have no negative eigenvalues with Euclidean distances. However, the example above should *still* give you negative eigenvalues and change in pseudo-F, because you did two contradictory things: you supplied non-Euclidean dissimilarities in input, and you asked for Euclidean distances in the command. In this case, the 'distance' argument will be silently ignored and input dissimilarities will be used. Please check this. I have no idea what you mean with "correcting" for negative eigenvalues. You can have transformation that removes them, but I cannot see how that would be a "correction". Cheers, Jari Oksanen