Dear R folk: Perhaps I'm just dense today, but I am having trouble reproducing the principal components plotted and summarized by clusplot. Here is a brief example using the pluton dataset. clusplot reports that the first two principal components explain 99.7% of the variability. But this is not what princomp is reporting. I would greatly appreciate any advice. With best regards, -- Tom> R.version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 0.1 year 2004 month 11 day 15 language R> require("cluster")[1] TRUE> pluton.agnes <- agnes(pluton) > clusters <- cutree(as.hclust(pluton.agnes), h=4.00) > clusplot(pluton, clusters, lines=0) > pca <- princomp(pluton, cor=TRUE) > loadings(pca)Loadings: Comp.1 Comp.2 Comp.3 Comp.4 Pu238 0.521 0.348 0.714 0.313 Pu239 -0.540 0.837 Pu240 0.418 -0.835 0.353 Pu241 0.512 0.418 -0.698 0.277 Comp.1 Comp.2 Comp.3 Comp.4 SS loadings 1.00 1.00 1.00 1.00 Proportion Var 0.25 0.25 0.25 0.25 Cumulative Var 0.25 0.50 0.75 1.00
Thomas M. Parris writes:> clusplot reports that the first two principal components explain > 99.7% of the variability.[...]>> loadings(pca)[...]> Comp.1 Comp.2 Comp.3 Comp.4 > SS loadings 1.00 1.00 1.00 1.00 > Proportion Var 0.25 0.25 0.25 0.25 > Cumulative Var 0.25 0.50 0.75 1.00This has nothing to do with how much of the variability of the original data that is captured by each component; it merely measures the variability in the coefficients of the loading vectors (and they are standardised to length one in princomp) What you want to look at is pca$sdev, for instance something like totvar <- sum(pca$sdev^2) rbind("explained var" = pca$sdev^2, "prop. expl. var" = pca$sdev^2/totvar, "cum.prop.expl.var" = cumsum(pca$sdev^2)/totvar) Comp.1 Comp.2 Comp.3 Comp.4 explained var 3.4093746 0.5785399 0.011560142 0.0005252824 prop. expl. var 0.8523437 0.1446350 0.002890036 0.0001313206 cum.prop.expl.var 0.8523437 0.9969786 0.999868679 1.0000000000 And as you can see, two comps "explain" 99.7%. :-) -- Bj??rn-Helge Mevik