All,
How can we find a distance matrix for categorical data
ie. given a csv below
var1 var2 var3 var4
element1-1 yes x a k
element1-2 no y b l
element1-3 maybe y c m
how can i compute the distance matrix between all the elements
Actually i need it to create clusters on top of it
Thanks & Regards
Kapil
[[alternative HTML version deleted]]
see ?daisy in the library cluster Cheers Joris On Thu, Jun 10, 2010 at 6:12 PM, kapil mahant <kapil_mahant at yahoo.com> wrote:> All, > > How can we find a distance matrix for categorical data > > ie. ?given a csv below > > ? ? ? ? ? ? ? ? ? var1 ? ? ? ? var2 ? ?var3 ? ?var4 > element1-1 ? yes ? ? ? ? ? ?x ? ? ? ? a ? ? ? ? k > element1-2 ? no ? ? ? ? ? ? y ? ? ? ? b ? ? ? ? l > element1-3 ? maybe ? ? ? y ? ? ? ? c ? ? ? ? ?m > > how can i compute the distance matrix between all the elements > > Actually i need it to create clusters on top of it > > Thanks & Regards > Kapil > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
correction. See ?daisy in the PACKAGE cluster. *slaps head* cheers Joris On Thu, Jun 10, 2010 at 7:02 PM, Joris Meys <jorismeys at gmail.com> wrote:> see ?daisy in the library cluster > > Cheers > Joris > > On Thu, Jun 10, 2010 at 6:12 PM, kapil mahant <kapil_mahant at yahoo.com> wrote: >> All, >> >> How can we find a distance matrix for categorical data >> >> ie. ?given a csv below >> >> ? ? ? ? ? ? ? ? ? var1 ? ? ? ? var2 ? ?var3 ? ?var4 >> element1-1 ? yes ? ? ? ? ? ?x ? ? ? ? a ? ? ? ? k >> element1-2 ? no ? ? ? ? ? ? y ? ? ? ? b ? ? ? ? l >> element1-3 ? maybe ? ? ? y ? ? ? ? c ? ? ? ? ?m >> >> how can i compute the distance matrix between all the elements >> >> Actually i need it to create clusters on top of it >> >> Thanks & Regards >> Kapil >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Applied mathematics, biometrics and process control > > tel : +32 9 264 59 87 > Joris.Meys at Ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
Thanks Guys ,
I am able to generate the distance matrix for mixed column values ( categorical
and ordinal ) using daisy function
But can anyone tell me how to generate clusters out of it , The point being i
dont know the number of cluster beforehand
Let me give an overview of the problem i am trying to solve is
Given a dataset , something like below
var1 var2 var3 Size
element1-1 yes x present 100
element1-2 no y absent 294
element1-3 maybe x absent 45
The first 3 variables being categorical and last one being ordinal
I need to do the following
1 ) Generate clusters out of it ( let say they are "training
clusters" )
I am able to compute distance matrix ( using daisy ) , but not sure
how to create unknown numbers of clusters , dbscan work on a distance matrix
2 ) Once that is done i want to spread some new data points in the above plot
space ( lets say these are "test points" )
3) Find out which "test points" are lying within a boundary of any
above discovered training clusters
If anyone know how to get this done then please let me know
Its for an academic project and i am unable to make any progress
Thanks and Regards
K
________________________________
From: Ingmar Visser <i.visser@uva.nl>
Sent: Fri, 11 June, 2010 2:19:33 PM
Subject: Re: [R] Finding distance matrix for categorical data
latent class analysis may be more appropriate depending on your hypotheses,
best, Ingmar
e:
All,>
>>How can we find a distance matrix for categorical data
>
>>ie. given a csv below
>
>> var1 var2 var3 var4
>>element1-1 yes x a k
>>element1-2 no y b l
>>element1-3 maybe y c m
>
>>how can i compute the distance matrix between all the elements
>
>>Actually i need it to create clusters on top of it
>
>>Thanks & Regards
>>Kapil
>
>
>> [[alternative HTML version deleted]]
>
>>______________________________________________
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Possibly Parallel Threads
- cluster analysis and supervised classification: an alternative to knn1?
- Mahalanobis distance
- Bug in agrep computing edit distance?
- [LLVMdev] RFC: Supporting ELF symbol aliases via GlobalAlias GEPs
- [LLVMdev] RFC: Supporting ELF symbol aliases via GlobalAlias GEPs