Dear all, I was looking for methods in R that allow assessing the number of significant principal coordinates. Unfortunatly I was not very successful. I expanded my search to the web and Current Contents, however, the information I found is very limited. Therefore, I tried to write code for doing a randomization. I would highly appriciate if somebody could comment on the following approach. I am neither a statistician, nor an R expert... the data matrix I used has 72 species (columns) and 167 samples (rows). Many thanks in advance, Christian> # focus on ~80% of all the eigenvalues > > nEigen <- round(ncol(Data*0.8)) > > # Calculate Weights for Principal Coordinates Analysis > > Total <- apply(Data,1,sum) > Weight <- round(Total/max(Total)*1000) > > > # Calculate Chord Distance > > library(vegan) > Chord <- vegdist(decostand(Data, "norm"), "euclidean") > > # Calculate Principal Coordinates, including distance matrix row weights > > library(ade4) > PCoord.Eigen <- dudi.pco(Chord,row.w=Weight,scann=F,full=T)$eig[1:nEigen] > > # Randomization of Principal Coordinates Analysis > > library(labdsv) > for (i in 1:99) { > Data.random <- rndtaxa(Data,species=T,plots=T) > Total.random <- apply(Data.random,1,sum) > Weight.random <- round(Total.random/max(Total.random)*1000) > Chord.random <- vegdist(decostand(Data.random, "norm"), "euclidean") > PCoord.Eigen.random <- > dudi.pco(Chord.random,row.w=Weight.random,scann=F,full=T)$eig[1:nEigen] > PCoord.Eigen <- cbind.data.frame(PCoord.Eigen, PCoord.Eigen.random) > } > > # Plot scree diagramm with original eigenvalues and 95%-quantiles of > eigenvalues from randomized principal coordinate analysis > > plot(c(1:nEigen),PCoord.Eigen[,1],type="b") > lines(c(1:nEigen),apply(PCoord.Eigen[,-1],1,quantile,probs=c(0.95)),col="red")Christian Kamenik Institute of Plant Sciences University of Bern
On Mon, 2005-03-14 at 18:32 +0100, Christian Kamenik wrote:> Dear all, > > I was looking for methods in R that allow assessing the number of > significant principal coordinates. Unfortunatly I was not very > successful. I expanded my search to the web and Current Contents, > however, the information I found is very limited. > Therefore, I tried to write code for doing a randomization. I would > highly appriciate if somebody could comment on the following approach. I > am neither a statistician, nor an R expert... the data matrix I used has > 72 species (columns) and 167 samples (rows). >Earlier this year (Sat, 29 Jan 2005) J?r?me Lema?tre asked something similar here under subject "Bootstrapped eigenvector" (but the code I posted then had one bug I know and perhaps some I don't know!). Some ecologists (Donald Jackson, Peres-Neto) have indeed tried to develop methods for PCA, and they could be easily modified for PCoA which is about the same method, in particular with Euclidean distances like you used. So the following two solutions are practically identical (within 2e-15 in the case I tried): x <- decostand(x, "norm") # in vegan chordis <- dist(x) # Euclidean is the default, so this is chord distance pcoa <- cmdscale(chordis) pca <- prcomp(x) Verify this with: procrustes(pcoa, pca, choices=1:2) # in vegan PCoA with row weights is something different, but I really don't know why would you like to do this. I really don't understand what people mean with "significant" eigenvalues, unless they are making Factor Analysis. In PCA, you rotate your data, and you can find low-rank approximations of your data, but how these are rotatations are "significant" is beyond my imagination. Further, resampling with replacement seems to suit poorly to multivariate analysis: it duplicates some rows and so it makes easier to find similar rows that is the ultimate task in PC rotation. It seems that Monte Carlo results are systematically "better" than any original data (only if number of rows is much lower than number of columns this is not disturbing). Also, resampling or shuffling species tends to create communities that are fundamentally different from any real community we have: instead of single or a few abundant species, they may have several or none. With total abundance constraint you can hide the traces of anarchistic community assembly, but not its fundamental fault. So I do think that (1) you cannot use resampling in assessing PCA and its kin, (2) you cannot say what is the meaning of being "significant" in this case, and (3) the number of "significant" axes would only be a function of sample size even here. Now my hope is that some guru over there gets so irritated that (s)he chastises me for writing such pieces of stupidity, and sends a correct solution here with accompanying code and references to the literature. Let's hope so. The old truth is that most data sets have 2.5 dimensions (Kruskal): those two that you can show in a printed plot, and that half a dimension that you must explain away in the text. Wouldn't that be a sufficient solution? cheers, jari oksanen -- Jari Oksanen <jarioksa at sun3.oulu.fi>
Seemingly Similar Threads
- Principal coordinates analysis
- The RV coinertia coefficient to interpret multivariate analysis plots
- Ordination Plotting: Warning: Species scores not available
- number of analogs in significance test of MAT reconstructions using randomTF from palaeoSig
- save 3dplot to file