If I perform PCA on the 'eurodist' data, should I get an accurate geographic layout of the cities with biplot? (barring inversions, i.e. their is no way to define north.. but you get the idea...) I have a complex distance matrix, and I am thinking about how to cluster it and how to visualize the quality of the resulting clusters. If I could 'see' the clusters in space I could understand how / what the cluster algorithms were doing. Can I use PCA over the distance matrix to to do that? Sorry for the dumb questions. Dan.
On Wed, 13 Oct 2004, Dan Bolser wrote:> If I perform PCA on the 'eurodist' data, should I get an accurate > geographic layout of the cities with biplot?No, but a good approximation.> (barring inversions, i.e. their is no way to define north.. but you get > the idea...) > > I have a complex distance matrix, and I am thinking about how to cluster > it and how to visualize the quality of the resulting clusters.Using PCA and plotting the first two components is classical multi-dimensional scaling, as implemented by cmdscale(). Look up MDS somewhere (e.g. in MASS). It is exact if the distances are Euclidean in 2D. However, eurodist gives road distances on the surface of sphere. Classic examples for the illustration of MDS are departements of France based on proximity data and cities in the UK based on road distances. There is a minor point as to what you mean `with biplot', covered in MASS4: it depends on the exact definition of biplot (and biplot.princomp has a parameter -- this is not by default done in S-PLUS in a way that makes your statement correct).> If I could 'see' the clusters in space I could understand how / what the > cluster algorithms were doing.A standard topic for MDS: see e.g. two of my books (MASS and my Pattern Recognition and Neural Networks) for extensive examples.> Can I use PCA over the distance matrix to to do that? > > Sorry for the dumb questions.Please do some homework: suggestions above and in the posting guide. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Wed, 2004-10-13 at 09:51, Prof Brian Ripley wrote:> On Wed, 13 Oct 2004, Dan Bolser wrote:> > I have a complex distance matrix, and I am thinking about how to cluster > > it and how to visualize the quality of the resulting clusters. > > Using PCA and plotting the first two components is classical > multi-dimensional scaling, as implemented by cmdscale(). Look up MDS > somewhere (e.g. in MASS). It is exact if the distances are Euclidean in > 2D. However, eurodist gives road distances on the surface of sphere. > > Classic examples for the illustration of MDS are departements of France > based on proximity data and cities in the UK based on road distances. >These road distances seem to be very non-Euclidean indeed (even non-metric). It seems to be 2282km from Athens to Milan if you go directly, but if you go via Rome it is only 1403km:> trip <- c("Athens", "Rome", "Milan") > as.matrix(eurodist)[trip, trip]Athens Rome Milan Athens 0 817 2282 Rome 817 0 586 Milan 2282 586 0> 817 + 586[1] 1403 I thought that World is non-Euclidean, but not that obviously. cheers, jari oksanen -- Jari Oksanen <jarioksa at sun3.oulu.fi>
On 13 Oct 2004, Jari Oksanen wrote:>On Wed, 2004-10-13 at 09:51, Prof Brian Ripley wrote: >> On Wed, 13 Oct 2004, Dan Bolser wrote: > >> > I have a complex distance matrix, and I am thinking about how to cluster >> > it and how to visualize the quality of the resulting clusters. >> >> Using PCA and plotting the first two components is classical >> multi-dimensional scaling, as implemented by cmdscale(). Look up MDS >> somewhere (e.g. in MASS). It is exact if the distances are Euclidean in >> 2D. However, eurodist gives road distances on the surface of sphere. >> >> Classic examples for the illustration of MDS are departements of France >> based on proximity data and cities in the UK based on road distances. >> >These road distances seem to be very non-Euclidean indeed (even >non-metric). It seems to be 2282km from Athens to Milan if you go >directly, but if you go via Rome it is only 1403km:All roads lead to rome? Aparently that is true if you ever try to get out of the place in rush hour.>> trip <- c("Athens", "Rome", "Milan") >> as.matrix(eurodist)[trip, trip] > Athens Rome Milan >Athens 0 817 2282 >Rome 817 0 586 >Milan 2282 586 0 >> 817 + 586 >[1] 1403 > >I thought that World is non-Euclidean, but not that obviously.yes, especially not europe on its own. My geography is worse than my statistics, but it looked a bit mangled up even to me. Thanks very much both again, Dan.> >cheers, jari oksanen > > >