I have two issues. 1-I am trying to use morphology to identify gender. I have 9 variables, both continuous and categorical. I was using two-step cluster analysis in SPSS because two-step could deal with different types of variables. But the output tells me that an animal is in cluster 1 or 2, it does not give me a probability (ex. 0.70 cluster 2). I also did not want to specify that I want two clusters, I wanted to see if analysis would naturally give me two clusters. These were all advantages to using SPSS but now I'm having trouble. Does cluster analysis in R give probabilities? Which type of cluster analysis in R is best to use? I did not think hierarchical analysis was a great choice, but maybe I'm wrong. I don't want to create the average variable, I want the analysis to do it on its own. I'm also new to R so would have to figure out the right codes to enter, etc. 2-I was also told to analyze each variable on its own before including it in cluster analysis. I had first included them all then teased out which ones were not important, but now have been asked to do the reverse. I cannot do cluster analysis on one variable -for example, one variable is either present or absent on an individual so of course cluster analysis gives me two clusters, one representing present and one representing absent. I was told to use regression, but how can regression also not give the same result? I feel like it would give me a line connecting a bunch of 0s to 1s. I don't know what to use, or if I can analyze each variable like this before putting them into cluster analysis. I ultimately want to only use the smallest number of variables necessary to identify gender. I have tried reading manuals etc and talking to people at my school, but nothing has helped. If anyone has any insight, that would be much appreciated Thank you! -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html Sent from the R help mailing list archive at Nabble.com.
Dear KitKat, After installing R and reading some introductory material on getting started with R you may want to check the CRAN task view on cluster analysis: http://cran.r-project.org/web/views/Cluster.html which has many useful references to all kinds and flavors of clustering techniques, hierarchical or not, selecting the nr of clusters based on some model selection statistic, et cetera. hth, Ingmar On Thu, Nov 15, 2012 at 7:14 PM, KitKat <katherinewright@trentu.ca> wrote:> I have two issues. > > 1-I am trying to use morphology to identify gender. I have 9 variables, > both > continuous and categorical. I was using two-step cluster analysis in SPSS > because two-step could deal with different types of variables. But the > output tells me that an animal is in cluster 1 or 2, it does not give me a > probability (ex. 0.70 cluster 2). I also did not want to specify that I > want two clusters, I wanted to see if analysis would naturally give me two > clusters. These were all advantages to using SPSS but now I'm having > trouble. > > Does cluster analysis in R give probabilities? > Which type of cluster analysis in R is best to use? I did not think > hierarchical analysis was a great choice, but maybe I'm wrong. I don't want > to create the average variable, I want the analysis to do it on its own. > I'm also new to R so would have to figure out the right codes to enter, > etc. > > 2-I was also told to analyze each variable on its own before including it > in > cluster analysis. I had first included them all then teased out which ones > were not important, but now have been asked to do the reverse. I cannot do > cluster analysis on one variable -for example, one variable is either > present or absent on an individual so of course cluster analysis gives me > two clusters, one representing present and one representing absent. I was > told to use regression, but how can regression also not give the same > result? I feel like it would give me a line connecting a bunch of 0s to 1s. > I don't know what to use, or if I can analyze each variable like this > before > putting them into cluster analysis. I ultimately want to only use the > smallest number of variables necessary to identify gender. > > I have tried reading manuals etc and talking to people at my school, but > nothing has helped. If anyone has any insight, that would be much > appreciated > Thank you! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Have a look at the package mclust. Jose ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Ingmar Visser [i.visser at uva.nl] Sent: 15 November 2012 21:10 To: KitKat Cc: r-help at r-project.org Subject: Re: [R] cluster analysis in R Dear KitKat, After installing R and reading some introductory material on getting started with R you may want to check the CRAN task view on cluster analysis: http://cran.r-project.org/web/views/Cluster.html which has many useful references to all kinds and flavors of clustering techniques, hierarchical or not, selecting the nr of clusters based on some model selection statistic, et cetera. hth, Ingmar On Thu, Nov 15, 2012 at 7:14 PM, KitKat <katherinewright at trentu.ca> wrote:> I have two issues. > > 1-I am trying to use morphology to identify gender. I have 9 variables, > both > continuous and categorical. I was using two-step cluster analysis in SPSS > because two-step could deal with different types of variables. But the > output tells me that an animal is in cluster 1 or 2, it does not give me a > probability (ex. 0.70 cluster 2). I also did not want to specify that I > want two clusters, I wanted to see if analysis would naturally give me two > clusters. These were all advantages to using SPSS but now I'm having > trouble. > > Does cluster analysis in R give probabilities? > Which type of cluster analysis in R is best to use? I did not think > hierarchical analysis was a great choice, but maybe I'm wrong. I don't want > to create the average variable, I want the analysis to do it on its own. > I'm also new to R so would have to figure out the right codes to enter, > etc. > > 2-I was also told to analyze each variable on its own before including it > in > cluster analysis. I had first included them all then teased out which ones > were not important, but now have been asked to do the reverse. I cannot do > cluster analysis on one variable -for example, one variable is either > present or absent on an individual so of course cluster analysis gives me > two clusters, one representing present and one representing absent. I was > told to use regression, but how can regression also not give the same > result? I feel like it would give me a line connecting a bunch of 0s to 1s. > I don't know what to use, or if I can analyze each variable like this > before > putting them into cluster analysis. I ultimately want to only use the > smallest number of variables necessary to identify gender. > > I have tried reading manuals etc and talking to people at my school, but > nothing has helped. If anyone has any insight, that would be much > appreciated > Thank you! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Wrap Up & Run 10k next March to raise vital funds for Age UK Six exciting new 10k races are taking place throughout the country and we want you to join in the fun! Whether you're a runner or not, these are events are for everyone ~ from walking groups to serious athletes. The Age UK Events Team will provide you with a training plan to suit your level and lots of tips to make this your first successful challenge of 2012. Beat the January blues and raise some vital funds to help us prevent avoidable deaths amongst older people this winter. Sign up now! www.ageuk.org.uk/10k Coming to; London Crystal Palace, Southport, Tatton Park, Cheshire Harewood House, Leeds,Coventry, Exeter Age UK Improving later life www.ageuk.org.uk ------------------------------- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. ------------------------------ This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of people in later life. The three national Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities: Age Scotland, Age NI, Age Cymru.
Dear Katherine, function flexmixedruns in package fpc may do what you want; it fits mixtures with continuous and categorical variables, can use the BIC for giving you the number of mixture components and also gives you posterior probabilities for cases to belong to components. Note that generally finding the right cluster analysis method is a complicated task and depends crucially on your application, what use you want to make of the clusters etc., so what's best cannot be conclusively said on a mailing list. The same holds for whether and how to select variables. Certainly it's not wrong in general to use all the variables that you have but whether it's better otherwise depends on what meaning your variables have and how this relates to the aim of clustering, what to do with the variables afterwards etc. You may have a look at http://www.rss.org.uk/site/cms/contentviewarticle.asp?article=866#Link%20to%20Nov.%202012%20paper where I discuss a number of related issues. Best regards, Christian *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 c.hennig at ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of KitKat [katherinewright at trentu.ca] Sent: 15 November 2012 18:14 To: r-help at r-project.org Subject: [R] cluster analysis in R I have two issues. 1-I am trying to use morphology to identify gender. I have 9 variables, both continuous and categorical. I was using two-step cluster analysis in SPSS because two-step could deal with different types of variables. But the output tells me that an animal is in cluster 1 or 2, it does not give me a probability (ex. 0.70 cluster 2). I also did not want to specify that I want two clusters, I wanted to see if analysis would naturally give me two clusters. These were all advantages to using SPSS but now I'm having trouble. Does cluster analysis in R give probabilities? Which type of cluster analysis in R is best to use? I did not think hierarchical analysis was a great choice, but maybe I'm wrong. I don't want to create the average variable, I want the analysis to do it on its own. I'm also new to R so would have to figure out the right codes to enter, etc. 2-I was also told to analyze each variable on its own before including it in cluster analysis. I had first included them all then teased out which ones were not important, but now have been asked to do the reverse. I cannot do cluster analysis on one variable -for example, one variable is either present or absent on an individual so of course cluster analysis gives me two clusters, one representing present and one representing absent. I was told to use regression, but how can regression also not give the same result? I feel like it would give me a line connecting a bunch of 0s to 1s. I don't know what to use, or if I can analyze each variable like this before putting them into cluster analysis. I ultimately want to only use the smallest number of variables necessary to identify gender. I have tried reading manuals etc and talking to people at my school, but nothing has helped. If anyone has any insight, that would be much appreciated Thank you! -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you for replying! I made a new post asking if there are any websites or files on how to download package mclust (or other Bayesian cluster analysis packages) and the appropriate R functions? Sorry I don't know how this forum works yet -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635p4650341.html Sent from the R help mailing list archive at Nabble.com.
These are the errors I've been having. I have been trying 3 different things 1- Mclust: This is the example I have been following: # Model Based Clustering library(mclust) fit <- Mclust(mydata) plot(fit, mydata) # plot results print(fit) # display the best model What I have done:> fit <- Mclust(mydat) > plot(fit, mydat) #plot resultsError in match.arg(what, c("BIC", "classification", "uncertainty", "density"), : 'arg' must be NULL or a character vector 2- Mclust using different website (cran-r) instructions This is the example:> mydatMclust <- Mclust(mydat) > summary(mydatMclust) > summary(mydatMclust, parameters = TRUE) > plot(mydatMclust)There are a couple other steps but the plot is the problem. I get two plots, there should be four. One should be plotting all my individuals but it's plotting my variables instead. It's also taking a very long time. R script at this point says: "Waiting to confirm page change? " 3. Mcclust Instructions from cran-r: data(cls.draw2) # sample of 500 clusterings from a Bayesian cluster model tru.class <- rep(1:8,each=50) # the true grouping of the observations psm2 <- comp.psm(cls.draw2) # posterior similarity matrix # optimize criteria based on PSM mbind2 <- minbinder(psm2) mpear2 <- maxpear(psm2) # Relabelling k <- apply(cls.draw2,1, function(cl) length(table(cl))) max.k <- as.numeric(names(table(k))[which.max(table(k))]) relab2 <- relabel(cls.draw2[k==max.k,]) # compare clusterings found by different methods with true grouping arandi(mpear2$cl, tru.class) arandi(mbind2$cl, tru.class) arandi(relab2$cl, tru.class) I called my data: mydat so I changed that where appropriate. I cannot get past one early step, psm2 <- comp.psm(cls.draw2).. the error reads: "Error: could not find function "comp.psm"" I think I have all appropriate packages installed. I don't know what more to do on these three errors. Any help would be great! Thank you -- View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635p4650466.html Sent from the R help mailing list archive at Nabble.com.