Hi I have a large dataset, containing a wide range of binary variables. I would like first of all to compute a jaccard matrix, then do a PCA on this matrix, so that I finally can do a hierarchical clustering on the principal components. My problem is, that I don't know how to compute the jaccard dissimilarity matrix in R? Which package to use, and so on... Can anybody help me? Alternatively I'm search for another way to explore the clusters present in my data. Another problem is, that I have cases with missing values on different variables. Jacob -- View this message in context: http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html Sent from the R help mailing list archive at Nabble.com.
Flabbergaster <jlunding <at> gmail.com> writes:> My problem is, that I don't know how to compute the jaccard dissimilarity > matrix in R? Which package to use, and so on...http://rss.acs.unt.edu/Rdoc/library/arules/html/dissimilarity.html http://cc.oulu.fi/~jarioksa/softhelp/vegan/html/vegdist.html
Jacob, You might have a look at the vegan package. It might compute the Jaccard distance and it might have some other toolsa that you might be interested in. Dave From: Flabbergaster <jlunding@gmail.com> To: r-help@r-project.org Date: 12/28/2010 08:26 AM Subject: [R] Jaccard dissimilarity matrix for PCA Sent by: r-help-bounces@r-project.org Hi I have a large dataset, containing a wide range of binary variables. I would like first of all to compute a jaccard matrix, then do a PCA on this matrix, so that I finally can do a hierarchical clustering on the principal components. My problem is, that I don't know how to compute the jaccard dissimilarity matrix in R? Which package to use, and so on... Can anybody help me? Alternatively I'm search for another way to explore the clusters present in my data. Another problem is, that I have cases with missing values on different variables. Jacob -- View this message in context: http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
jaccard in package prabclus computes a Jaccard matrix for you. By the way, if you want to do hierarchical clustering, it doesn't seem to be a good idea to me to run PCA first. Why not cluster the dissimilarity matrix directly without information loss by PCA? (I should not make too general statements on this because generally how to cluster data always depends on the aim of clustering, the cluster concept you are interested in etc.) prabclus also contains clustering methods for such data; have a look at the functions prabclust and hprabclust (however, they are documented as functions for clustering species distribution ranges, so if your application is different, you may have to think about whether and how to adapt them). Hope this helps, Christian On Tue, 28 Dec 2010, Flabbergaster wrote:> > Hi > I have a large dataset, containing a wide range of binary variables. > I would like first of all to compute a jaccard matrix, then do a PCA on this > matrix, so that I finally can do a hierarchical clustering on the principal > components. > My problem is, that I don't know how to compute the jaccard dissimilarity > matrix in R? Which package to use, and so on... > Can anybody help me? > Alternatively I'm search for another way to explore the clusters present in > my data. > Another problem is, that I have cases with missing values on different > variables. > > Jacob > -- > View this message in context: http://r.789695.n4.nabble.com/Jaccard-dissimilarity-matrix-for-PCA-tp3165982p3165982.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >*** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche