All, How can we find a distance matrix for categorical data ie. given a csv below var1 var2 var3 var4 element1-1 yes x a k element1-2 no y b l element1-3 maybe y c m how can i compute the distance matrix between all the elements Actually i need it to create clusters on top of it Thanks & Regards Kapil [[alternative HTML version deleted]]
see ?daisy in the library cluster Cheers Joris On Thu, Jun 10, 2010 at 6:12 PM, kapil mahant <kapil_mahant at yahoo.com> wrote:> All, > > How can we find a distance matrix for categorical data > > ie. ?given a csv below > > ? ? ? ? ? ? ? ? ? var1 ? ? ? ? var2 ? ?var3 ? ?var4 > element1-1 ? yes ? ? ? ? ? ?x ? ? ? ? a ? ? ? ? k > element1-2 ? no ? ? ? ? ? ? y ? ? ? ? b ? ? ? ? l > element1-3 ? maybe ? ? ? y ? ? ? ? c ? ? ? ? ?m > > how can i compute the distance matrix between all the elements > > Actually i need it to create clusters on top of it > > Thanks & Regards > Kapil > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
correction. See ?daisy in the PACKAGE cluster. *slaps head* cheers Joris On Thu, Jun 10, 2010 at 7:02 PM, Joris Meys <jorismeys at gmail.com> wrote:> see ?daisy in the library cluster > > Cheers > Joris > > On Thu, Jun 10, 2010 at 6:12 PM, kapil mahant <kapil_mahant at yahoo.com> wrote: >> All, >> >> How can we find a distance matrix for categorical data >> >> ie. ?given a csv below >> >> ? ? ? ? ? ? ? ? ? var1 ? ? ? ? var2 ? ?var3 ? ?var4 >> element1-1 ? yes ? ? ? ? ? ?x ? ? ? ? a ? ? ? ? k >> element1-2 ? no ? ? ? ? ? ? y ? ? ? ? b ? ? ? ? l >> element1-3 ? maybe ? ? ? y ? ? ? ? c ? ? ? ? ?m >> >> how can i compute the distance matrix between all the elements >> >> Actually i need it to create clusters on top of it >> >> Thanks & Regards >> Kapil >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > Joris.Meys at Ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
Thanks Guys , I am able to generate the distance matrix for mixed column values ( categorical and ordinal ) using daisy function But can anyone tell me how to generate clusters out of it , The point being i dont know the number of cluster beforehand Let me give an overview of the problem i am trying to solve is Given a dataset , something like below var1 var2 var3 Size element1-1 yes x present 100 element1-2 no y absent 294 element1-3 maybe x absent 45 The first 3 variables being categorical and last one being ordinal I need to do the following 1 ) Generate clusters out of it ( let say they are "training clusters" ) I am able to compute distance matrix ( using daisy ) , but not sure how to create unknown numbers of clusters , dbscan work on a distance matrix 2 ) Once that is done i want to spread some new data points in the above plot space ( lets say these are "test points" ) 3) Find out which "test points" are lying within a boundary of any above discovered training clusters If anyone know how to get this done then please let me know Its for an academic project and i am unable to make any progress Thanks and Regards K ________________________________ From: Ingmar Visser <i.visser@uva.nl> Sent: Fri, 11 June, 2010 2:19:33 PM Subject: Re: [R] Finding distance matrix for categorical data latent class analysis may be more appropriate depending on your hypotheses, best, Ingmar e: All,> >>How can we find a distance matrix for categorical data > >>ie. given a csv below > >> var1 var2 var3 var4 >>element1-1 yes x a k >>element1-2 no y b l >>element1-3 maybe y c m > >>how can i compute the distance matrix between all the elements > >>Actually i need it to create clusters on top of it > >>Thanks & Regards >>Kapil > > >> [[alternative HTML version deleted]] > >>______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Seemingly Similar Threads
- cluster analysis and supervised classification: an alternative to knn1?
- Mahalanobis distance
- Bug in agrep computing edit distance?
- [LLVMdev] RFC: Supporting ELF symbol aliases via GlobalAlias GEPs
- [LLVMdev] RFC: Supporting ELF symbol aliases via GlobalAlias GEPs