Neil Griffin
2007-Jul-23 17:03 UTC
[R] cca and cca.predict in vegan-what sort of prediction is possible
Hi All I am not clear quite how one could use cca from package vegan and the associated predict.cca to predict species abundance from environmental data (or if this is possible in a generalised way). In other words, can one derive a cca object based on known community data and use that to predict e.g. species abundances in a different number of samples based on environmental data? The help notes show that prediction is possible, but it seems that the number of samples is constrained to that in the original, "training" set. If this is possible, a reference or example would be much, much appreciated. Thanks Neil Neil Griffin Unilever Centre for Environmental Water Quality Institute for Water Research, Rhodes University PO Box 94, Grahamstown, 6140, South Africa Email: neil at iwr.ru.ac.za http://www.rhodes.ac.za/institutes/iwr/ Tel: +27 46 622 2428 Fax: +27 46 622 9427
Jari Oksanen
2007-Aug-06 11:23 UTC
[R] cca and cca.predict in vegan-what sort of prediction is possible
> I am not clear quite how one could use cca from package vegan and the associated > predict.cca to predict species abundance from environmental data (or if this is possible > in a generalised way). In other words, can one derive a cca object based on known > community data and use that to predict e.g. species abundances in a different number > of samples based on environmental data? The help notes show that prediction is > possible, but it seems that the number of samples is constrained to > that in the original, > "training" set. > > If this is possible, a reference or example would be much, much appreciated.This is not possible with the current predict.cca. It seems that you want to use CCA to approximate your original data (type = "response" in predict.cca), and that ignores 'newdata' argument. However, this type of prediction is doable and simply looking at the code shows you how to do that. You only need linear combination scores (u in the code), species scores (v), eigenvalues and row and column totals for the data approximation. You can use predict.cca to get the linear combination scores (u) using environmental data as 'newdata' with new sites, and then you can use this in the predict.cca code. You also need to supply totals (sums) for new rows. This is all pretty technical and tedious. In principle, the function could be changed to accept optional arguments for linear combination, weighted averages and species scores, but then it also would need matching arguments for row and column sums making the usage tedious (change would be easy, usage difficult). I think it is better to look at the code and follow its example if you really are in need of more complicated analyses. Another issue is that CCA is not good in predicting species composition: it only is weighted linear regression. You will see, for instance, that the method happily gives you negative abundances that some ecologists find very disturbing. If you really want to predict species composition from environmental data, I suggest nonlinear regression (mgcv:::gam with appropriate family, for instance) or some fancier methods. Please note that this kind of specific questions should not be sent to the R News, but to more specialized mailing lists or to the package author directly (although the author was not reading email in July). Best wishes, Jari Oksanen