thr3ads.net - search: "agglomer"

Displaying 20 results from an estimated 36 matches for "agglomer".

Did you mean: aglomera

agglomerative coefficient in agnes (cluster)

2005 Jan 25

agglomerative coefficient in agnes (cluster)

I haven't read the book, but could anyone explain more about this parameter? help(agnes) says that ac measures the amount of clustering structure found. From the definition given in help(agnes.object), however, it seems that as long as the dissimilarity of the merger in the final step of the algorithm is large enough, the ac value will be close to 1. So what does ac really mean? Thank

Introduction and Doubts

2016 Mar 09

Introduction and Doubts

...ferent is different from what taught in theory.I am also working on R&D on "Hybrid Techniques for Intrusion Detection using Data Mining and Clustering on Newer Datasets". Taking initial look at the docsim folder in xapian-core. These are my insights The clustering used is Single Link Agglomerative Hierarchical clustering. Its Time Complexity is O(n^2) for n=number of documents. At first Choosing K-means seems to be viable solution as K-Means has O(n) Time Complexity. But it has various Shortcomings 1) The learning algorithm requires apriori specification of the number of cluster centers...

Dendrogram for agglomerative hierarchical clustering result

2008 Feb 11

Dendrogram for agglomerative hierarchical clustering result

Hey group, I have a problem of drawing dendrogram as the result of my program written in C. My algorithm is a approximation algorithm for single linkage method. AS a result I will get the following data: [Average distance] [cluster A] [cluster B] For example: 42.593141 1 26 42.593141 4 6 42.593141 123 124 42.593141 4 113 74.244206 1 123 74.244206 4 133 74.244206 1 36 So far I have used C to

k-nn hierarchical clustering

2011 Jun 09

k-nn hierarchical clustering

Hi there, is there any R-function for k-nearest neighbour agglomerative hierarchical clustering? By this I mean standard agglomerative hierarchical clustering as in hclust or agnes, but with the k-nearest neighbour distance between clusters used on the higher levels where there are at least k>1 distances between two clusters (single linkage is 1-nearest neig...

difference between trees in R?

2001 Aug 21

difference between trees in R?

Hi. I am wondering if anybody has studied and/or written code in R to calculate the distance between 2 "trees". For example, if one does a hierarchical agglomerative clustering and say, a hierachical divisive clustering (represented as trees) and wishes to compute a metric on them. I am thinking of something like the symmetric difference as mentioned in Margush and McMorris (1982). My application is actually a bit different than that above so I'll de...

Introduction and Doubts

2016 Mar 10

Introduction and Doubts

...o xapian project. sorry if that was against the rules The algorithm is not developed by me but after having much research on various clustering techniques. I found that there is a new algorithm called CLUBS(Clustering Using Binary Splitting) which gives better results than kmeans++ and hierarchical agglomerative clustering. It is faster and produces good results based on various metrics of cluster quality. the algorithm works in following way The first phase of the algorithm is divisive, as the original data set(in this case, set of search documents to cluster) is split recursively into miniclusters...

non-uniqueness in cluster analysis

2003 Dec 03

non-uniqueness in cluster analysis

Hi, I'm clustering objects defined by categorical variables with a hierarchical algorithm - average linkage. My distance matrix (general dissimilarity coefficient) includes several distances with exactly the same values. As I see, a standard agglomerative procedure ignores this problems, simply selecting, above equal distances, the one that comes first. For this reason the analysis in output depends strongly on the orderings of the objects within the raw data matrix. Is there a standard procedure to deal with this? Thanks Bruno

Time-Ordered Clustering

2009 Mar 12

Time-Ordered Clustering

...orms constraint-based clusters? Ideally the package could perform "Time-Ordered Clustering", a technique applied in a recent journal article by Runger, Nelson, Harnish (using MS Excel). Quote, "in our specific implementation of constrained clustering, the clustering algorithm remains agglomerative and hierarchical, but observations or clusters are constrained to only join if they are adjacent in time." CRAN searches using variants of "cluster" and/or "constraint" and/or "time" etc. didn't yield anything I could recognize. Thank you, Paul Paul P...

creating dendrogram from cluster hierarchy

2006 Feb 28

creating dendrogram from cluster hierarchy

Dear R users, I have created data for hierarchical agglomerative cluster analysis which consist of the merging pairs and the agglomeration heights, e.g. something like my.merge <- matrix(c(-1,-2,-3,1), ncol=2, byrow=TRUE) my.height <- c(0.5, 1) I'd like to plot a corresponding dendrogram but I don't know how to convert my data to achieve...

Cluster prediction from factor/numeric datasets

2007 Jul 23

Cluster prediction from factor/numeric datasets

Hi all, I have a dataset with numeric and factor columns of data which I developed a Gower Dissimilarity Matrix for (Daisy) and used Agglomerative Nesting (Agnes) to develop 20 clusters. I would like to use the 20 clusters to determine cluster membership for a new dataset (using predict) but cannot find a way to do this (no way to "predict" in the cluster package). I know I can use "predict" in cclust, kcca, and fle...

How to plot the dendrogram or tree for kmeans ?

2008 Mar 20

How to plot the dendrogram or tree for kmeans ?

Hi, How to plot the dendrogram or tree for kmeans, like we do for hclust ? [[alternative HTML version deleted]]

hierarchical clustering within a size limit

2011 May 11

hierarchical clustering within a size limit

Hello List, I am trying to implement a hierarchical cluster using the hclust method agglomerative single linkage method with a small wrinkle. I would like to cluster a set of numbers on a number line only if they are within a distance of 500. I would then like to print out the members of this list. So far I can put a vector: > x<-c(2,10,200,300,600,700) into a distance matrix: >...

GSoC 2017 Project Proposal

2017 Mar 09

GSoC 2017 Project Proposal

Hello devs. I would like to propose how I plan to go about improving and getting a system that can be integrated into Xapian in this GSoC for the clustering branch. I have identified three areas of work which were not touched last time. 1) Automated Performance Analysis I had roughly implemented 2 evaluation techniques previously (Distance b/w document and centroids within clusters and

doubts about Silhouette

2007 Oct 16

doubts about Silhouette

...to try to explain myself. I have fitted a spline to my data, I have fitted a spline, filled in the missing data by replicating the spline coefficients associated to the last node. I obtained a number of dendograms by different combination of distance and link-method by calling DIST and AGNES. The agglomerative coefficient is very high (~ 0.99) for some combinations, and is generally around 0.5 for the remaining cases. As recommended, I ran the SILHOUETTE at different cuts (CUTREE) for some of the cases. Irregardless of the AC value the highest silhouette width I get is ~ 0.4 or lower, which is to...

Clustering with 'agnes'

2004 Feb 04

Clustering with 'agnes'

...was wondering if anyone knew how I can identify cluster points after running the agnes function. For example, I created a dataset with points randomly scattered around (0,0), (0,1) and (1,0). After clustering, the dendrogram shows all the clustered points and I get the ordering and height and the agglomerative coefficient. But nowhere do I see the three actual points listed. Although agnes clusters until there is one main cluster, it is clear that at three clusters, each of the clusters consist of points around the three main points. I was wondering if there was any way in which I can have R give me...

x/y coordinates of dendrogram branches

2005 Nov 02

x/y coordinates of dendrogram branches

Dear R-users, I need some help concerning the plotting of dendrograms for hierarchical agglomerative clustering. The agglomeration niveau of each step should be displayed at the branches of the dendrogram. For this I need the x/y coordinates of the branch-agglomerations of the dendrogram. The y-values are known (the heights of the agglomeration), but how can I get the x-values? > myda...

agnes clustering and NAs

2011 Jan 27

agnes clustering and NAs

Hello, In the documentation for agnes in the package 'cluster', it says that NAs are allowed, and sure enough it works for a small example like : > m <- matrix(c( 1, 1, 1, 2, 1, NA, 1, 1, 1, 2, 2, 2), nrow = 3, byrow = TRUE) > agnes(m) Call: agnes(x = m) Agglomerative coefficient: 0.1614168 Order of objects: [1] 1 2 3 Height (summary): Min. 1st Qu. Median Mean 3rd Qu. Max. 1.155 1.247 1.339 1.339 1.431 1.524 Available components: [1] "order" "height" "ac" "merge" "diss" &qu...

Error: cannot allocate vector of size 1.8 Gb

2008 Dec 22

Error: cannot allocate vector of size 1.8 Gb

> dim(data) [1] 22283 19 > dm=dist(data, method = "euclidean", diag = FALSE, upper = FALSE, p = 2) Error: cannot allocate vector of size 1.8 Gb Hi Guys, thank you in advance for helping. :-D Recently I ran into the "cannot allocate vector of size 1.8GB" error. I am pretty sure this is not a hardware limitation because it happens no matter I ran the R code in a

cluster analyses

2002 Apr 29

cluster analyses

...ather large data sets and would like to cut the dendrograms to get a better view of specific components. I calculate the dissimilarity matrix using daisy() because I have a mixture of variable types: factors, ordered factors and numerical variables. If I want one dendrogram, I use agnes() for the agglomerative nesting and pltree() to draw the dendrogram. That way, I get the row names as labels, but I can't cut the tree. Alternatively, I use hclust() on the dissimilarity matrix from daisy(). This allows me to cut the dendrogram with cutree(), but I loose the labels, so that isn't much use....

Question about AGNES by Rousseeuw et al. in the package "cluster": How many clusters?

2007 Nov 14

Question about AGNES by Rousseeuw et al. in the package "cluster": How many clusters?

...am no stat wiz and I am just trying to use the AGNES algorithm at my very modest level of statistical of understanding. I have difficulties understanding the ouput from AGNES. My question is: how to interpret the output, especially how do you I know which cluster solution is the best? In SPSS, an Agglomeration Schedule table is produced and I used to look at the biggest jump between the error coefficients for each agglomerative steps in order to find where to stop clustering. But with the Agnes output I don't know what I should be looking at. Thanks so much for your help, Aude Aude Plontz Ref...

search for: agglomer