thr3ads.net - R help - [R] cluster analysis in R [Nov 2012]

If this information is useful, please help other people find it:
Share via:

KitKat

2012-Nov-15 18:14 UTC

[R] cluster analysis in R

I have two issues. 

1-I am trying to use morphology to identify gender. I have 9 variables, both
continuous and categorical. I was using two-step cluster analysis in SPSS
because two-step could deal with different types of variables. But the
output tells me that an animal is in cluster 1 or 2, it does not give me a
probability (ex. 0.70 cluster 2).  I also did not want to specify that I
want two clusters, I wanted to see if analysis would naturally give me two
clusters. These were all advantages to using SPSS but now I'm having
trouble.

Does cluster analysis in R give probabilities?
Which type of cluster analysis in R is best to use? I did not think
hierarchical analysis was a great choice, but maybe I'm wrong. I don't
want
to create the average variable, I want the analysis to do it on its own. 
I'm also new to R so would have to figure out the right codes to enter, etc.

2-I was also told to analyze each variable on its own before including it in
cluster analysis. I had first included them all then teased out which ones
were not important, but now have been asked to do the reverse. I cannot do
cluster analysis on one variable -for example, one variable is either
present or absent on an individual so of course cluster analysis gives me
two clusters, one representing present and one representing absent. I was
told to use regression, but how can regression also not give the same
result? I feel like it would give me a line connecting a bunch of 0s to 1s.
I don't know what to use, or if I can analyze each variable like this before
putting them into cluster analysis. I ultimately want to only use the
smallest number of variables necessary to identify gender. 

I have tried reading manuals etc and talking to people at my school, but
nothing has helped. If anyone has any insight, that would be much
appreciated
Thank you!



--
View this message in context:
http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
Sent from the R help mailing list archive at Nabble.com.

Ingmar Visser

2012-Nov-15 21:10 UTC

head link

[R] cluster analysis in R

Dear KitKat,

After installing R and reading some introductory material on getting
started with R you may want to check the CRAN task view on cluster analysis:
http://cran.r-project.org/web/views/Cluster.html
which has many useful references to all kinds and flavors of clustering
techniques, hierarchical or not, selecting the nr of clusters based on some
model selection statistic, et cetera.

hth, Ingmar

On Thu, Nov 15, 2012 at 7:14 PM, KitKat <katherinewright@trentu.ca> wrote:
> I have two issues.
>
> 1-I am trying to use morphology to identify gender. I have 9 variables,
> both
> continuous and categorical. I was using two-step cluster analysis in SPSS
> because two-step could deal with different types of variables. But the
> output tells me that an animal is in cluster 1 or 2, it does not give me a
> probability (ex. 0.70 cluster 2).  I also did not want to specify that I
> want two clusters, I wanted to see if analysis would naturally give me two
> clusters. These were all advantages to using SPSS but now I'm having
> trouble.
>
> Does cluster analysis in R give probabilities?
> Which type of cluster analysis in R is best to use? I did not think
> hierarchical analysis was a great choice, but maybe I'm wrong. I
don't want
> to create the average variable, I want the analysis to do it on its own.
> I'm also new to R so would have to figure out the right codes to enter,
> etc.
>
> 2-I was also told to analyze each variable on its own before including it
> in
> cluster analysis. I had first included them all then teased out which ones
> were not important, but now have been asked to do the reverse. I cannot do
> cluster analysis on one variable -for example, one variable is either
> present or absent on an individual so of course cluster analysis gives me
> two clusters, one representing present and one representing absent. I was
> told to use regression, but how can regression also not give the same
> result? I feel like it would give me a line connecting a bunch of 0s to 1s.
> I don't know what to use, or if I can analyze each variable like this
> before
> putting them into cluster analysis. I ultimately want to only use the
> smallest number of variables necessary to identify gender.
>
> I have tried reading manuals etc and talking to people at my school, but
> nothing has helped. If anyone has any insight, that would be much
> appreciated
> Thank you!
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jose Iparraguirre

2012-Nov-15 23:34 UTC

head link

[R] cluster analysis in R

Have a look at the package mclust.
Jose
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Ingmar Visser [i.visser at uva.nl]
Sent: 15 November 2012 21:10
To: KitKat
Cc: r-help at r-project.org
Subject: Re: [R] cluster analysis in R

Dear KitKat,

After installing R and reading some introductory material on getting
started with R you may want to check the CRAN task view on cluster analysis:
http://cran.r-project.org/web/views/Cluster.html
which has many useful references to all kinds and flavors of clustering
techniques, hierarchical or not, selecting the nr of clusters based on some
model selection statistic, et cetera.

hth, Ingmar

On Thu, Nov 15, 2012 at 7:14 PM, KitKat <katherinewright at trentu.ca>
wrote:
> I have two issues.
>
> 1-I am trying to use morphology to identify gender. I have 9 variables,
> both
> continuous and categorical. I was using two-step cluster analysis in SPSS
> because two-step could deal with different types of variables. But the
> output tells me that an animal is in cluster 1 or 2, it does not give me a
> probability (ex. 0.70 cluster 2).  I also did not want to specify that I
> want two clusters, I wanted to see if analysis would naturally give me two
> clusters. These were all advantages to using SPSS but now I'm having
> trouble.
>
> Does cluster analysis in R give probabilities?
> Which type of cluster analysis in R is best to use? I did not think
> hierarchical analysis was a great choice, but maybe I'm wrong. I
don't want
> to create the average variable, I want the analysis to do it on its own.
> I'm also new to R so would have to figure out the right codes to enter,
> etc.
>
> 2-I was also told to analyze each variable on its own before including it
> in
> cluster analysis. I had first included them all then teased out which ones
> were not important, but now have been asked to do the reverse. I cannot do
> cluster analysis on one variable -for example, one variable is either
> present or absent on an individual so of course cluster analysis gives me
> two clusters, one representing present and one representing absent. I was
> told to use regression, but how can regression also not give the same
> result? I feel like it would give me a line connecting a bunch of 0s to 1s.
> I don't know what to use, or if I can analyze each variable like this
> before
> putting them into cluster analysis. I ultimately want to only use the
> smallest number of variables necessary to identify gender.
>
> I have tried reading manuals etc and talking to people at my school, but
> nothing has helped. If anyone has any insight, that would be much
> appreciated
> Thank you!
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Wrap Up & Run 10k next March to raise vital funds for Age UK

Six exciting new 10k races are taking place throughout the country and we want
you to join in the fun! Whether you're a runner or not, these are
events are for everyone ~ from walking groups to serious athletes. The Age UK
Events Team will provide you with a training plan to suit your
level and lots of tips to make this your first successful challenge of 2012.
Beat the January blues and raise some vital funds to help us
prevent avoidable deaths amongst older people this winter.

Sign up now! www.ageuk.org.uk/10k

Coming to; London Crystal Palace, Southport, Tatton Park, Cheshire Harewood
House, Leeds,Coventry, Exeter

Age UK Improving later life
www.ageuk.org.uk

-------------------------------
Age UK is a registered charity and company limited by guarantee, (registered
charity number 1128267, registered company number 6825798).
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed
Representative of Age UK Enterprises Limited, Age UK is an Introducer
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth
Access for the purposes of introducing potential annuity and health
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit
Solutions Limited and Simplyhealth Access are all authorised and
regulated by the Financial Services Authority. 
------------------------------

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are
addressed. If you receive a message in error, please advise the sender and
delete immediately.

Except where this email is sent in the usual course of our business, any
opinions expressed in this email are those of the author and do not
necessarily reflect the opinions of Age UK or its subsidiaries and associated
companies. Age UK monitors all e-mail transmissions passing
through its network and may block or modify mails which are deemed to be
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number
272786) and their trading and other associated companies merged
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to
improving the lives of people in later life.  The three national
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help
the Aged in these nations to form three registered charities:
Age Scotland, Age NI, Age Cymru.

Hennig, Christian

2012-Nov-16 11:03 UTC

head link

[R] cluster analysis in R

Dear Katherine,

function flexmixedruns in package fpc may do what you want; it fits mixtures
with continuous and categorical variables, can use the BIC for giving you the
number of mixture components and also gives you posterior probabilities for
cases to belong to components.

Note that generally finding the right cluster analysis method is a complicated
task and depends crucially on your application, what use you want to make of the
clusters etc., so what's best cannot be conclusively said on a mailing list.
The same holds for whether and how to select variables. Certainly it's not
wrong in general to use all the variables that you have but whether it's
better otherwise depends on what meaning your variables have and how this
relates to the aim of clustering, what to do with the variables afterwards etc.

You may have a look at 
http://www.rss.org.uk/site/cms/contentviewarticle.asp?article=866#Link%20to%20Nov.%202012%20paper
where I discuss a number of related issues.

Best regards,
Christian


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
c.hennig at ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on
behalf of KitKat [katherinewright at trentu.ca]
Sent: 15 November 2012 18:14
To: r-help at r-project.org
Subject: [R] cluster analysis in R

I have two issues.

1-I am trying to use morphology to identify gender. I have 9 variables, both
continuous and categorical. I was using two-step cluster analysis in SPSS
because two-step could deal with different types of variables. But the
output tells me that an animal is in cluster 1 or 2, it does not give me a
probability (ex. 0.70 cluster 2).  I also did not want to specify that I
want two clusters, I wanted to see if analysis would naturally give me two
clusters. These were all advantages to using SPSS but now I'm having
trouble.

Does cluster analysis in R give probabilities?
Which type of cluster analysis in R is best to use? I did not think
hierarchical analysis was a great choice, but maybe I'm wrong. I don't
want
to create the average variable, I want the analysis to do it on its own.
I'm also new to R so would have to figure out the right codes to enter, etc.

2-I was also told to analyze each variable on its own before including it in
cluster analysis. I had first included them all then teased out which ones
were not important, but now have been asked to do the reverse. I cannot do
cluster analysis on one variable -for example, one variable is either
present or absent on an individual so of course cluster analysis gives me
two clusters, one representing present and one representing absent. I was
told to use regression, but how can regression also not give the same
result? I feel like it would give me a line connecting a bunch of 0s to 1s.
I don't know what to use, or if I can analyze each variable like this before
putting them into cluster analysis. I ultimately want to only use the
smallest number of variables necessary to identify gender.

I have tried reading manuals etc and talking to people at my school, but
nothing has helped. If anyone has any insight, that would be much
appreciated
Thank you!



--
View this message in context:
http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

KitKat

2012-Nov-21 18:36 UTC

head link

[R] cluster analysis in R

Thank you for replying! 
I made a new post asking if there are any websites or files on how to
download package mclust (or other Bayesian cluster analysis packages) and
the appropriate R functions? Sorry I don't know how this forum works yet



--
View this message in context:
http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635p4650341.html
Sent from the R help mailing list archive at Nabble.com.

KitKat

2012-Nov-22 18:32 UTC

head link

[R] cluster analysis in R

These are the errors I've been having. I have been trying 3 different things

1- Mclust:
This is the example I have been following:
# Model Based Clustering
library(mclust)
fit <- Mclust(mydata)
plot(fit, mydata) # plot results 
print(fit) # display the best model 
 
What I have done:> fit <- Mclust(mydat)
> plot(fit, mydat) #plot resultsError in match.arg(what, c("BIC", "classification",
"uncertainty",
"density"),  : 
  'arg' must be NULL or a character vector

2- Mclust using different website (cran-r) instructions
This is the example: > mydatMclust <- Mclust(mydat)
> summary(mydatMclust)
> summary(mydatMclust, parameters = TRUE)
> plot(mydatMclust)
There are a couple other steps but the plot is the problem. I get two plots,
there should be four. One should be plotting all my individuals but it's
plotting my variables instead. It's also taking a very long time. R script
at this point says: "Waiting to confirm page change? "

3. Mcclust 
Instructions from cran-r:
data(cls.draw2)
# sample of 500 clusterings from a Bayesian cluster model
tru.class <- rep(1:8,each=50)
# the true grouping of the observations
psm2 <- comp.psm(cls.draw2)
# posterior similarity matrix
# optimize criteria based on PSM
mbind2 <- minbinder(psm2)
mpear2 <- maxpear(psm2)
# Relabelling
k <- apply(cls.draw2,1, function(cl) length(table(cl)))
max.k <- as.numeric(names(table(k))[which.max(table(k))])
relab2 <- relabel(cls.draw2[k==max.k,])
# compare clusterings found by different methods with true grouping
arandi(mpear2$cl, tru.class)
arandi(mbind2$cl, tru.class)
arandi(relab2$cl, tru.class)

I called my data: mydat so I changed that where appropriate. I cannot get
past one early step, psm2 <- comp.psm(cls.draw2).. the error reads:
"Error:
could not find function "comp.psm""

I think I have all appropriate packages installed. I don't know what more to
do on these three errors.  Any help would be great! Thank you




--
View this message in context:
http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635p4650466.html
Sent from the R help mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Nov 2012 - cluster analysis in R

[R] cluster analysis in R

[R] cluster analysis in R

[R] cluster analysis in R

[R] cluster analysis in R

[R] cluster analysis in R

[R] cluster analysis in R

Seemingly Similar Threads