thr3ads.net - similar to: "hclust, does order of data matter?"

Displaying 20 results from an estimated 7000 matches similar to: "hclust, does order of data matter?"

2011 Jun 09

k-nn hierarchical clustering

Hi there, is there any R-function for k-nearest neighbour agglomerative hierarchical clustering? By this I mean standard agglomerative hierarchical clustering as in hclust or agnes, but with the k-nearest neighbour distance between clusters used on the higher levels where there are at least k>1 distances between two clusters (single linkage is 1-nearest neighbour clustering)? Best regards,

Clustering Large Applications..sort of

2011 Aug 10

Clustering Large Applications..sort of

Hello all, I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering functions do not seem able to fit this size of practical problem. Library 'hclust' is good (though it may be sub par for this size of problem, thus doubly poor for this application) in that I do not want to make assumptions about the number of

clustering algorithm detail

2005 Oct 19

clustering algorithm detail

Hi all, I wanted to run the hclust (or any other clustering algorithm) on a distance matrix. I have formed the distance matrix as: distmat: a b c d e a 0.00 0.96 1.60 1.60 1.68 b 0.96 0.00 0.96 1.80 2.64 c 1.60 0.96 0.00 0.84 1.80 d 1.60 1.80 0.84 0.00 0.96 e 1.68 2.64 1.80 0.96 0.00

cutree with agnes

2003 Dec 11

cutree with agnes

Hi, this is rather a (presumed) bug report than a question because I can solve my personal statistical problem by working with hclust instead of agnes. I have done a complete linkage clustering on a dist object dm with 30 objects with agnes (R 1.8.0 on RedHat) and I want to obtain the partition that results from a cut at height=0.4. I run > cl1a <- agnes(dm, method="complete")

cutree with agnes

2003 Dec 11

cutree with agnes

cluster/distance large matrix

2010 Feb 11

cluster/distance large matrix

Hi all, I've stumbled upon some memory limitations for the analysis that I want to run. I've a matrix of distances between 38000 objects. These distances were calculated outside of R. I want to cluster these objects. For smaller sets (egn=100) this is how I proceed: A<-matrix(scan(file, n=100*100),100,100, byrow=TRUE) ad<-as.dist(A)

clustering, don't understand this error

2009 Apr 15

clustering, don't understand this error

Hello, I am using the dunn metric, but something is wrong and I dont understand what or what that this error mean. Please can you help me with this? The instructions are: #Indice de Dunn disbupa=dist(bupa[,1:6]) a=hclust(disbupa) cluster.stats(disbupa,a,bupa[,7])$dunn And the error is: Error in max(clustering) : invalid 'type' (list) of argument thank you so much. Ana Maria

How to re-order clusters of hclust output?

2012 May 11

How to re-order clusters of hclust output?

Hello, The heatmap function conveniently has a "reorder.dendrogram" function so that clusters follow a certain logic. It seems that the hclust function doesn't have such feature. I can use the "reorder" function on the dendrogram obtained from hclust, but this does not modify the hclust object itself. I understand that the answer should be within the "heatmap"

Finding an order for an hclust (dendrogram) object without intersections

2010 Jun 13

Finding an order for an hclust (dendrogram) object without intersections

Hello all, I manually created an hclust object. Now I am looking to reorder the leafs so they won't intersect with each other, and would be happy for advises on how to do that. Here is an example code: #------------------------------------- a <- list() # initialize empty object # define merging pattern: # negative numbers are leaves, # positive are merged clusters (defined by row

kmeans and incom,plete distance matrix concern

2006 Aug 07

kmeans and incom,plete distance matrix concern

Hi there I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created i.e: [ mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2), dimnames = list(levels(DF$V1), levels(DF$V2))) mat[cbind(DF$V1, DF$V2)] <- DF$V3 This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering. My query

own distance

2010 Sep 07

own distance

Is it possible to implement my own distance and mean for k-means clustering for any clustering package in R? Just looking for simple way, without creating a new package. karsar

cluster.stats

2008 Jun 13

cluster.stats

Dear list, I just tried to use the function cluster.stat in the package fpc. I just have a couple of questions about the syntax: cluster.stats(d,clustering,alt.clustering=NULL, silhouette=TRUE,G2=FALSE,G3=FALSE) 1) the distance object (d) is an object obtained by the function dist() on my own original matrix? 2) clustering is the clusters vector as result of one of the many clustering methods?

Help with Mahalanobis

2005 Jul 08

Help with Mahalanobis

Dear R list, I'm trying to calculate Mahalanobis distances for 'Species' of 'iris' data as obtained below: Squared Distance to Species From Species: Setosa Versicolor Virginica Setosa 0 89.86419 179.38471 Versicolor 89.86419 0 17.20107 Virginica 179.38471 17.20107 0 These distances were obtained with proc 'CANDISC'

Regression slope confidence interval

2005 Sep 29

Regression slope confidence interval

Hi list, is there any direct way to obtain confidence intervals for the regression slope from lm, predict.lm or the like? (If not, is there any reason? This is also missing in some other statistics softwares, and I thought this would be quite a standard application.) I know that it's easy to implement but it's for explanation to people who faint if they have to do their own programming...

Error in hclust?

2012 Jul 04

Error in hclust?

Dear R users, I have noted a difference in the merge distances given by hclust using centroid method. For the following data: x<-c(1009.9,1012.5,1011.1,1011.8,1009.3,1010.6) and using Euclidean distance, hclust using centroid method gives the following results: > x.dist<-dist(x) > x.aah<-hclust(x.dist,method="centroid") > x.aah$merge [,1] [,2] [1,] -3 -6

selecting outliers

2005 Aug 08

selecting outliers

Hi everybody, I'd like to know if there's an easy way for extracting outliers record from a dataset, in order to perform further analysis on them. Thanks Alessandro

plot.hclust: strange behaviour with "manufactured" hclust object

2002 Feb 20

plot.hclust: strange behaviour with "manufactured" hclust object

I've been trying to get plot.hclust to work with a hclust object I created and have not had much success. It seems that there is some "hidden" characteristic of a hclust object that I can't see. This is most easily seen in the following example, where plot.hclust works on one object, but when this object is "dumped" and then re-read, plot.hclust no longer works. Is

hclust too slow?

2009 Nov 17

hclust too slow?

Hi, I am new to clustering in R and I have a dataset with approximately 17,000 rows and 8 columns with each data point a numerical character with three decimal places. I would like to cluster the 8 columns so that I get a dendrogram as an output. So, I am simply creating a distance matrix of my data, using the 'hclust' function, and then plotting the results (see below, my data is

Document clustering for R

2005 Sep 12

Document clustering for R

I'm working on a project related to document clustering. I know that R has clustering algorithms such as clara, but only supports two distance metrics: euclidian and manhattan, which are not very useful for clustering documents. I was wondering how easy it would be to extend the clustering package in R to support other distance metrics, such as cosine distance, or if there was an API for

Hclust question

2005 Jun 03

Hclust question

Hey, I am running hclust on several different distance matrices and I have a question thats more about labeling. I've been looking for a way to label the edge values on the graph with their distances between them. I've been looking through the documentation and I haven't found anything yet. Anyone know if there is a way to plot 'hclust' graphs with such edge values? Or

similar to: hclust, does order of data matter?