thr3ads.net - similar to: "Simple clustering help"

Displaying 20 results from an estimated 9000 matches similar to: "Simple clustering help"

Inversions in hierarchical clustering were they shouldn't be

2011 Jul 27

Inversions in hierarchical clustering were they shouldn't be

Hi, I''m using heatmap.2 to cluster my data, using the centroid method for clustering and the maximum method for calculating the distance matrix: library("gplots") library("RColorBrewer") test <- matrix(c(0.96, 0.07, 0.97, 0.98, 0.50, 0.28, 0.29, 0.77, 0.08, 0.96, 0.51, 0.51, 0.14, 0.19, 0.41, 0.51), ncol=4, byrow=TRUE)

hierarchical clustering with pearson's coefficient

2013 Mar 28

hierarchical clustering with pearson's coefficient

Hello, I want to use pearson's correlation as distance between observations and then use any centroid based linkage distance (ex. Ward's distance) When linkage distances are formed as the Lance-Williams recursive formulation, they just require the initial distance between observations. See here: http://en.wikipedia.org/wiki/Ward%27s_method It is said that you have to use euclidean

Clustering

2007 Nov 28

Clustering

Hello all! I am performingsome clustering analysis on microarray data using agnes{cluster} and I have created my own dissimilarity matrix according to a distance measure different from "euclidean" or "manhattan" etc. My question is, if I choose for example method="complete", how are the distances between the elements calculated? Are they taken form the dissimilarity

clustering with hclust

2014 Jul 25

clustering with hclust

Hi everybody, I have a problem with a cluster analysis. I am trying to use hclust, method=ward. The Ward method works with SQUARED Euclidean distances. Hclust demands "a dissimilarity structure as produced by dist". Yet, dist does not seem to produce a table of squared euclidean distances, starting from cosines. In fact, computing manually the squared euclidean distances from cosines

weithed clustering (was: Re: problems with a large data set)

2001 Apr 27

weithed clustering (was: Re: problems with a large data set)

kmeans and clara work great. Thank you for the tip. I have another question: Is it possible to weight the observations in a cluster analysis ? I haven't found any mention of this in the kmeans of clara help texts. Moritz Lennert Charg? de recherche IGEAT - ULB t?l: 32-2-650.65.16 fax: 32-2-650.50.92 email: mlennert at ulb.ac.be > On Wed, 25 Apr 2001, Moritz Lennert wrote: >

pam() clustering for large data sets

2011 May 16

pam() clustering for large data sets

Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I

setting distance matrix and clustering methods in heatmap.2

2011 Jul 24

setting distance matrix and clustering methods in heatmap.2

heatmap.2 defaults to dist for calculating the distance matrix and hclust for clustering. Does anyone now how I can set dist to use the euclidean method and hclust to use the centroid method? I provided a compilable sample code bellow. I tried: distfun = dist(method = "euclidean"), but that doesn't work. Any ideas? library("gplots") library("RColorBrewer") test

-means, hybrid clustering or similar implementations on R

2003 May 07

-means, hybrid clustering or similar implementations on R

Hi, I would like to know if someone knows an extended implementation of k-means in R to find appropriate number of clusters for a given k-dimensional data. Also, I am working on clustering for forecasting, if someone is interested or has knowledge on implementational details please mail me, I would appreciate it. Regards Skanda Kallur "Cogito, ergo sum" (I think, therefore I

How to perform clustering without removing rows where NA is present in R

2013 Dec 07

How to perform clustering without removing rows where NA is present in R

I have a data which contain some NA value in their elements. What I want to do is to **perform clustering without removing rows** where the NA is present. I understand that `gower` distance measure in `daisy` allow such situation. But why my code below doesn't work? __BEGIN__ # plot heat map with dendogram together. library("gplots") library("cluster")

Hierarchical clustering using own distance matrices

2010 May 25

Hierarchical clustering using own distance matrices

Hey Everyone! I wanted to carry out Hierarchical clustering using distance matrices i have calculated ( instead of euclidean distance etc.) I understand as.dist is the function for this, but the distances in the dendrogram i got by using the following script(1) were not the distances defined in my distance matrices. script: var<-read.table("the distance matrix i calculated",

simple-rss caching

2006 Jun 14

simple-rss caching

The index page of my rails app grabs an rss feed from a neighboring news site. Unfortunately, the process of grabbing that feed seems to be slowing down the initial load time of my site to the point where it takes about 10-12 seconds to respond and render. I''d like to speed that up somehow (for 8-10 seconds it looks like my server is not responding at all..) Any suggestions? I

PAM Clustering

2017 Aug 17

PAM Clustering

Sorry, I never use pam. In the help, you can see that pam require a dataframe OR a dissimilarity matrix. If diss=FALSE then "euclidean" was use.So, I interpret that a matrix of dissimilarity is generated automatically. Problems may be in your data. Indeed pam(ruspini, 4)$diss write a dissimilaty matrix while pam(MYdata,10)$diss wite NULL 2017-08-17 16:03 GMT+02:00 Sema Atasever

K MEANS clustering

2016 Jul 27

K MEANS clustering

Hey Parth, Thanks for the reply. I am considering implementing a cosine distance metric too, along with euclidian distance because of the dimensionality issue that comes in with K-Means and euclidian distance metric. That does help when we deal with sparse vectors for documents. The particular problem I'm having is representing centroids in an efficient way. For example, when we find the mean

PAM Clustering

2017 Aug 17

PAM Clustering

Dear Germano, Thank you for your fast reply, In the above code, *MYData *is the actual data set. Do not we need to convert *MYData to *the dissimilarity matrix using *pam(as.dist(**MYData**), k = 10, diss = TRUE*)* code line?* *Regards.* On Thu, Aug 17, 2017 at 2:58 PM, Germano Rossi <germano.rossi at gmail.com> wrote: > try this > > MYdata <-

Plotting K-means clustering results on an MDS

2010 Aug 18

Plotting K-means clustering results on an MDS

Hello All, I'm having some trouble figuring out what the clearest way to plot my k-means clustering result on an my existing MDS. First I performed MDS on my distance matrix (note: I performed k-means on the MDS coordinates because applying a euclidean distance measure to my raw data would have been inappropriate) canto.MDS<-cmdscale(canto) I then figured out what would be my optimum

Formatted Data File Question for Clustering -Quickie Project

2007 Jun 13

Formatted Data File Question for Clustering -Quickie Project

I am trying to learn how to format Ascii data files for scan or read into R. Precisely for a quickie project, I found some code (at end of this email) to do exactly what I need: To cluster and graph a dendrogram from package (stats). I am stuck on how to format a text file to run the script. I looked at the dataset USArrests (which would be replaced by my data and labels) using UltraEdit. That

Simple Samba connection question to new Active Directory

2004 Dec 13

Simple Samba connection question to new Active Directory

Hello all! I currently have a small Windows NT 4 domain (named OLD_NETWORK). All files are stored on a UNIX server (running Solaris) running Samba 2.2. Runs perfect. No problems. Samba's only job in my network is JUST TO STORE AND SERVE OUT FILES to PCs. Samba does not run as a PDC. Merely validates valid users to get their files off UNIX server. I believe this is the simplest possible

Plotting Clustering Groups Separately

2002 Jul 18

Plotting Clustering Groups Separately

Hi As a beginer with R I have been trying to plot dendrograms for individual groups after using cutree. The example in the help files appears to work fine for Euclidean distances using the "average" clustering method. However, when I use the "Ward" method the the reprocessed subgroup does not appear to have the same structure as it did when the whole dataset was processed. Is

Hierarchical clustering with centroid method

2005 Jul 26

Hierarchical clustering with centroid method

Dear everybody! In the function hclust, at each stage distances between clusters are recomputed by the Lance-Williams dissimilarity update formula according to the particular clustering method being used. Using "centroid" method, Lance-Williams recurrence formula works properly only for euclidean distance. How is it possible to use properly centroid method with manhattan distance ?

Re: clustering polypeptide sequences

2003 Sep 08

Re: clustering polypeptide sequences

Hi Peter, You didn't give a very specific example, but it seems to me that what you wish to do is not really complicated. I suppose you have created a table of sequences vs. say hyprophobicity, charge, etc..., something like... seq hydroph arom b0001 0.104762 0.000000 b0002 0.035122 0.065854 b0003 0.024193 0.070968 b0004 -0.096729 0.084112 b0005 -0.973469 0.091837 b0006

similar to: Simple clustering help