thr3ads.net - R help - [R] Significance of Principal Coordinates [Mar 2005]

If this information is useful, please help other people find it:
Share via:

Christian Kamenik

2005-Mar-14 17:32 UTC

[R] Significance of Principal Coordinates

Dear all,

I was looking for methods in R that allow assessing the number of  
significant principal coordinates. Unfortunatly I was not very 
successful. I expanded my search to the web and Current Contents, 
however, the information I found is very limited.
Therefore, I tried to write code for doing a randomization. I would 
highly appriciate if somebody could comment on the following approach. I 
am neither a statistician, nor an R expert... the data matrix I used has 
72 species (columns) and 167 samples (rows).

Many thanks in advance, Christian
> # focus on ~80% of all the eigenvalues
>
> nEigen <- round(ncol(Data*0.8))
>
> # Calculate Weights for Principal Coordinates Analysis
>
> Total <- apply(Data,1,sum)
> Weight <- round(Total/max(Total)*1000)
>
>
> # Calculate Chord Distance
>
> library(vegan)
> Chord <- vegdist(decostand(Data, "norm"),
"euclidean")
>
> # Calculate Principal Coordinates, including distance matrix row weights
>
> library(ade4)
> PCoord.Eigen <-
dudi.pco(Chord,row.w=Weight,scann=F,full=T)$eig[1:nEigen]
>
> # Randomization of Principal Coordinates Analysis
>
> library(labdsv)
> for (i in 1:99) {
>     Data.random <- rndtaxa(Data,species=T,plots=T)
>     Total.random <- apply(Data.random,1,sum)
>     Weight.random <- round(Total.random/max(Total.random)*1000)
>     Chord.random <- vegdist(decostand(Data.random, "norm"),
"euclidean")
>     PCoord.Eigen.random <- 
> dudi.pco(Chord.random,row.w=Weight.random,scann=F,full=T)$eig[1:nEigen]
>     PCoord.Eigen <- cbind.data.frame(PCoord.Eigen, PCoord.Eigen.random)
> }
>
> # Plot scree diagramm with original eigenvalues and 95%-quantiles of 
> eigenvalues from randomized principal coordinate analysis
>
> plot(c(1:nEigen),PCoord.Eigen[,1],type="b")
>
lines(c(1:nEigen),apply(PCoord.Eigen[,-1],1,quantile,probs=c(0.95)),col="red")

Christian Kamenik
Institute of Plant Sciences
University of Bern

Jari Oksanen

2005-Mar-15 16:38 UTC

head link

[R] Significance of Principal Coordinates

On Mon, 2005-03-14 at 18:32 +0100, Christian Kamenik
wrote:> Dear all,
> 
> I was looking for methods in R that allow assessing the number of  
> significant principal coordinates. Unfortunatly I was not very 
> successful. I expanded my search to the web and Current Contents, 
> however, the information I found is very limited.
> Therefore, I tried to write code for doing a randomization. I would 
> highly appriciate if somebody could comment on the following approach. I 
> am neither a statistician, nor an R expert... the data matrix I used has 
> 72 species (columns) and 167 samples (rows).
> Earlier this year (Sat, 29 Jan 2005) J?r?me Lema?tre asked something
similar here under subject "Bootstrapped eigenvector" (but the code I
posted then had one bug I know and perhaps some I don't know!). Some
ecologists (Donald Jackson, Peres-Neto) have indeed tried to develop
methods for PCA, and they could be easily modified for PCoA which is
about the same method, in particular with Euclidean distances like you
used. So the following two solutions are practically identical (within
2e-15 in the case I tried):

x <- decostand(x, "norm") # in vegan
chordis <- dist(x) # Euclidean is the default, so this is chord distance
pcoa <- cmdscale(chordis)
pca <- prcomp(x)

Verify this with:

procrustes(pcoa, pca, choices=1:2) # in vegan

PCoA with row weights is something different, but I really don't know
why would you like to do this. I really don't understand what people
mean with "significant" eigenvalues, unless they are making Factor
Analysis. In PCA, you rotate your data, and you can find low-rank
approximations of your data, but how these are rotatations are
"significant" is beyond my imagination. Further, resampling with
replacement seems to suit poorly to multivariate analysis: it duplicates
some rows and so it makes easier to find similar rows that is the
ultimate task in PC rotation. It seems that Monte Carlo results are
systematically "better" than any original data (only if number of rows
is much lower than  number of columns this is not disturbing). Also,
resampling or shuffling species tends to create communities that are
fundamentally different from any real community we have: instead of
single or a few abundant species, they may have several or none. With
total abundance constraint you can hide the traces of anarchistic
community assembly, but not its fundamental fault. So I do think that
(1) you cannot use resampling in assessing PCA and its kin, (2) you
cannot say what is the meaning of being "significant" in this case,
and
(3) the number of "significant" axes would only be a function of
sample
size even here.

Now my hope is that some guru over there gets so irritated that (s)he
chastises me for writing such pieces of stupidity, and sends a correct
solution here with accompanying code and references to the literature.
Let's hope so.

The old truth is that most data sets have 2.5 dimensions (Kruskal):
those two that you can show in a printed plot, and that half a dimension
that you must explain away in the text. Wouldn't that be a sufficient
solution?

cheers, jari oksanen
-- 
Jari Oksanen <jarioksa at sun3.oulu.fi>

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Mar 2005 - Significance of Principal Coordinates

[R] Significance of Principal Coordinates

[R] Significance of Principal Coordinates

Seemingly Similar Threads