similar to: Massive clustering job?

Displaying 20 results from an estimated 10000 matches similar to: "Massive clustering job?"

k-nn hierarchical clustering

2011 Jun 09

1

k-nn hierarchical clustering

Hi there, is there any R-function for k-nearest neighbour agglomerative hierarchical clustering? By this I mean standard agglomerative hierarchical clustering as in hclust or agnes, but with the k-nearest neighbour distance between clusters used on the higher levels where there are at least k>1 distances between two clusters (single linkage is 1-nearest neighbour clustering)? Best regards,

cluster analysis

2003 Aug 11

2

cluster analysis

I'like to do cluster analysis by using mahalanobis distance. Could you tell me how to do?

-means, hybrid clustering or similar implementations on R

2003 May 07

1

-means, hybrid clustering or similar implementations on R

Hi, I would like to know if someone knows an extended implementation of k-means in R to find appropriate number of clusters for a given k-dimensional data. Also, I am working on clustering for forecasting, if someone is interested or has knowledge on implementational details please mail me, I would appreciate it. Regards Skanda Kallur "Cogito, ergo sum" (I think, therefore I

Cluster analysis, defining center seeds or number of clusters

2009 Jun 11

1

Cluster analysis, defining center seeds or number of clusters

I use kmeans to classify spectral events in high and low 1/3 octave bands: #Do cluster analysis CyclA<-data.frame(LlowA,LhghA) CntrA<-matrix(c(0.9,0.8,0.8,0.75,0.65,0.65), nrow = 3, ncol=2, byrow=TRUE) ClstA<-kmeans(CyclA,centers=CntrA,nstart=50,algorithm="MacQueen") This works well when the actual data shows 1,2 or 3 groups that are not "too close" in a cross plot.

more clustering questions

2004 Dec 09

1

more clustering questions

Sorry to bother you kind folks again with my questions. I am trying to learn as much as I can about all this, and I will admit that I don't have the proper background, but I hope that someone can at least point me in the correct direction. I have created a test matrix for what I want to do: s1 s2 s3 s4 s5 s1 10 5 0 8 7 s2 5 10 0 0 5 s3 0 0 10 0 0 s4 8 0 0 10 0 s5 7

non-uniqueness in cluster analysis

2003 Dec 03

3

non-uniqueness in cluster analysis

Hi, I'm clustering objects defined by categorical variables with a hierarchical algorithm - average linkage. My distance matrix (general dissimilarity coefficient) includes several distances with exactly the same values. As I see, a standard agglomerative procedure ignores this problems, simply selecting, above equal distances, the one that comes first. For this reason the analysis in output

pam() clustering for large data sets

2011 May 16

1

pam() clustering for large data sets

Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I

data(eurodist) and PCA ??

2004 Oct 13

3

data(eurodist) and PCA ??

If I perform PCA on the 'eurodist' data, should I get an accurate geographic layout of the cities with biplot? (barring inversions, i.e. their is no way to define north.. but you get the idea...) I have a complex distance matrix, and I am thinking about how to cluster it and how to visualize the quality of the resulting clusters. If I could 'see' the clusters in space I could

Cluster Analysis: Density-Based Method

2004 Oct 21

5

Cluster Analysis: Density-Based Method

Hi people, Does anybody know some Density-Based Method for clustering implemented in R? Thanks, Fernando Prass _______________________________________________________

Document clustering for R

2005 Sep 12

4

Document clustering for R

I'm working on a project related to document clustering. I know that R has clustering algorithms such as clara, but only supports two distance metrics: euclidian and manhattan, which are not very useful for clustering documents. I was wondering how easy it would be to extend the clustering package in R to support other distance metrics, such as cosine distance, or if there was an API for

bug (?!) in "pam()" clustering from fpc package ?

2008 Dec 17

1

bug (?!) in "pam()" clustering from fpc package ?

Hello all. I wish to run k-means with "manhattan" distance. Since this is not supported by the function "kmeans", I turned to the "pam" function in the "fpc" package. Yet, when I tried to have the algorithm run with different starting points, I found that pam ignores and keep on starting the algorithm from the same starting-points (medoids). For my

estimating number of clusters ("Null or more")

2003 Apr 24

1

estimating number of clusters ("Null or more")

Hi all, once more about the old subj :-) My data has too much various distribution families and for every particular experiment I need just to decide whether the data is "quite homogeneous" or it has two or more clusters. I've revisited the following libraries: amap, clust, cclust, mclust, multiv, normix, survey. And I didn't find any ready-to-use general

2004 May 04

1

spdep question

Dear list, (also sent to Roger Bivand, but perhaps somebody of you can help me also) I am trying to use package spdep for fitting an SAR model with errorsarlm. However, I am not sure how to make a valid nb object out of my neighborhood. As far as I have seen, there is no documentation for nb.object. I have done the following: class(pschmid$nb) <- "nb" # pschmid is a prab object as

selecting outliers

2005 Aug 08

2

selecting outliers

Hi everybody, I'd like to know if there's an easy way for extracting outliers record from a dataset, in order to perform further analysis on them. Thanks Alessandro

Validation of clustering

2003 Jan 30

2

Validation of clustering

Hi, I'm using the library cluster to cluster a set of figures (method CLARA). Somebody that it work with clustering would know informs what I make to evaluate the clustering? Tks VM, Francisco. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Francisco Júnior, Computer Science - UFPE-Brazil "One life has more value that the world whole" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2003 Apr 23

1

clustering

Dear R-users, I have a two - dimensional data set which needs to be clustered into groups: I'm searching for groups of points which show a positive correlation (in a twodimensional plot of the data set), but I do not have any knowledge about how many groups there might be. Do you know of a clustering algorithm in R (or in general) which can use a-priori information about the cluster's

Clustering of Binary data in R

2005 Mar 04

2

Clustering of Binary data in R

Good afternoon! I would like to ask you about similarity measures and clustering in R for Binary data. Would you please kindly help me and let me know about that commands in R? Thanks in advance for your kind attentions. I look forward to hearing from you as soon as possible. Best regards, Sima

Distances between two datasets of x and y co-ordinates

2008 Mar 12

4

Distances between two datasets of x and y co-ordinates

Hi all I am trying to determine the distances between two datasets of x and y points. The number of points in dataset One is very small i.e. perhaps 5-10. The number of points in dataset Two is likely to be very large i.e. 20,000-30,000. My initial approach was to append the first dataset to the second and then carry out the calculation: dists <- as.matrix(dist(gis data from 2 * datasets))

2008 Jun 13

3

cluster.stats

Dear list, I just tried to use the function cluster.stat in the package fpc. I just have a couple of questions about the syntax: cluster.stats(d,clustering,alt.clustering=NULL, silhouette=TRUE,G2=FALSE,G3=FALSE) 1) the distance object (d) is an object obtained by the function dist() on my own original matrix? 2) clustering is the clusters vector as result of one of the many clustering methods?

Clustering quality measure

2003 Jun 17

2

Clustering quality measure

Hi all, I am running a series of experiments where after manipulating my data I run several clustering algorithms (agnes, diana and a clustering method of my own) on the data. I wanted to determine which clustering method did the best job, so therefore I had defined my own quality measure using two criteria: compactness of the data within the clusters themselves and the amount of seperation