thr3ads.net - similar to: "[cluster package question] What is the "sum of the dissimilarities" in the pam command ?"

Displaying 20 results from an estimated 8000 matches similar to: "[cluster package question] What is the "sum of the dissimilarities" in the pam command ?"

bug (?!) in "pam()" clustering from fpc package ?

2008 Dec 17

bug (?!) in "pam()" clustering from fpc package ?

Hello all. I wish to run k-means with "manhattan" distance. Since this is not supported by the function "kmeans", I turned to the "pam" function in the "fpc" package. Yet, when I tried to have the algorithm run with different starting points, I found that pam ignores and keep on starting the algorithm from the same starting-points (medoids). For my

passing known medoids to clara() in the cluster package

2006 Apr 10

passing known medoids to clara() in the cluster package

Greetings, I have had good success using the clara() function to perform a simple cluster analysis on a large dataset (1 million+ records with 9 variables). Since the clara function is a wrapper to pam(), which will accept known medoid data - I am wondering if this too is possible with clara() ... The documentation does not suggest that this is possible. Essentially I am trying to

Looping and Pasting

2008 Feb 22

Looping and Pasting

Hello R-community: Much of the time I want to use loops to look at graphs, etc. For example, I have 25 plots, for which the names are m.1$medoids, m.2$medoids, ..., m.25$medoids. I want to index the object number (1:25) as below (just to show concept). for (i in 1:25){ plot(m.i$medoids) } I've tried the following, with negative results for ...

cantidad de datos

2015 Apr 29

cantidad de datos

Hola. Yo en vez de utilizar análisis cluster que impliquen distancias, probaría con un kmedias o con un pam (partition around medoids) pero utilizando muestras, la función clara de la librería cluster puede ayudarte. Pego el details de la ayuda de 'clara' Details clara is fully described in chapter 3 of Kaufman and Rousseeuw (1990). Compared to other partitioning methods such as pam,

pam() clustering for large data sets

2011 May 16

pam() clustering for large data sets

Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I

Any functions to manipulate (merge, cut, remove) hclust objects? (maybe through phylo?)

2010 Dec 27

Any functions to manipulate (merge, cut, remove) hclust objects? (maybe through phylo?)

Hello all, I'm now working with hclust objects and was hoping to perform some basic editing on them like: - Joining = the merging of two hclust objects (so they will share one root) - Splicing = So to cut/extract a branch out of an hclust object - that by itself will be an hclust object. I noticed I could extract one element of an hclust object by turning it into a dendrogram,

cantidad de datos

2015 Apr 29

cantidad de datos

El inconveniente con un K-medias, es que se tiene que se tiene que pre definir el número de segmentos, pero eso es algo con lo q no cuento. La solución de Javier me parece q sería la única opción. Atte. Ricardo Alva Valiente -----Mensaje original----- De: R-help-es [mailto:r-help-es-bounces en r-project.org] En nombre de javier.ruben.marcuzzi en gmail.com Enviado el: miércoles, 29 de abril de

give PAM my own medoids

2004 Jun 29

give PAM my own medoids

Hello, When using PAM (partitioning around medoids), I would like to skip the build-step and give the fonction my own medoids. Do you know if it is possible, and how ? Thank you very much. Isabel

Document clustering for R

2005 Sep 12

Document clustering for R

I'm working on a project related to document clustering. I know that R has clustering algorithms such as clara, but only supports two distance metrics: euclidian and manhattan, which are not very useful for clustering documents. I was wondering how easy it would be to extend the clustering package in R to support other distance metrics, such as cosine distance, or if there was an API for

Specifying medoids in PAM?

2005 Jun 07

Specifying medoids in PAM?

I am using the PAM algorithm in the CLUSTER library. When I allow PAM to seed the medoids using the default __build__ algorithm things work well: > pam(stats.table, metric="euclidean", stand=TRUE, k=5) But I have some clusters from a Hierarchical analysis that I would like to use as seeds for the PAM algorithm. I can't figure what the mediod argument wants. When I put in the

cantidad de datos

2015 Apr 29

cantidad de datos

Buen aporte?excelente!! Atte. Ricardo Alva Valiente De: Jose Luis Cañadas Reche [mailto:canadasreche en gmail.com] Enviado el: miércoles, 29 de abril de 2015 12:51 PM Para: Alva Valiente, Ricardo (RIAV); 'javier.ruben.marcuzzi en gmail.com'; R-help-es en r-project.org Asunto: Re: [R-es] cantidad de datos Podrías hacer varios kmedias con diferente número de clusters y comprobar como

How to access to sum of dissimilarities in CLARA

2005 May 30

How to access to sum of dissimilarities in CLARA

Dear All , Since dissimilarity is one of quality measures in clustering , I'm trying to access to the sum of dissimilarity as a whole measure. But after running my data using CLARA I obtain : 1128 dissimilarities, summarized : Min. 1st Qu. Median Mean 3rd Qu. Max. 0.033155 0.934630 2.257000 2.941600 4.876600 8.943700 But I can not find the sum of dissimilarity.How can i

Clustering

2007 Nov 28

Clustering

Hello all! I am performingsome clustering analysis on microarray data using agnes{cluster} and I have created my own dissimilarity matrix according to a distance measure different from "euclidean" or "manhattan" etc. My question is, if I choose for example method="complete", how are the distances between the elements calculated? Are they taken form the dissimilarity

Is there an R implementation for the "Barnard's exact test" (a substitute for fisher.test) ?

2009 Jul 26

Is there an R implementation for the "Barnard's exact test" (a substitute for fisher.test) ?

Hello R help members. I came across today with an article on Barnard's exact test (http://www.cytel.com/Papers/twobinomials.pdf), that is supposed to give a more powerful fisher.test - Because it doesn't assume that we know the row and column totals are in advance. Any pointers to such a function ? Thanks, Tal -- ---------------------------------------------- My contact information:

Are there any bloggers amoung us going to useR 2009 ?

2009 Jul 01

Are there any bloggers amoung us going to useR 2009 ?

*(note*: This is an R community question, not a statistical nor coding question. Since this is my first time writing such a post, I hope no one will take offence of it.) Hello all, I will be attending useR 2009 next week, and was wondering if there are any of you who are *bloggers *intending to participate and report on useR 2009? If so - I would love to know your blogs URL so as to follow you.

variable/model selction (step/stepAIC) for biglm ?

2009 Feb 21

variable/model selction (step/stepAIC) for biglm ?

Hello dear R mailing list members. I have recently became curious of the possibility applying model selection algorithms (even as simple as AIC) to regressions of large datasets. I searched as best as I could, but couldn't find any reference or wrapper for using step or stepAIC to packages such as biglm. Any ideas or directions of how to implement such a concept ? Best, Tal --

Bug in "seq" (or a "feature") ?

2009 Aug 10

Bug in "seq" (or a "feature") ?

(I use R 2.9.1 with win XP) If I run this code: seq(-0.1,.9, by = .05)[seq(-0.1,.9, by = .05) <= 0.5] I get this output: [1] -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Why is 0.50 not in the results ? (It seems that it gives a slightly bigger number then 0.5 but I don't understand why it does that) Where as if I try: seq(-0.1,.9, by = .05)[seq(-0.1,.9, by = .05) <=

Index-G1 error

2009 Feb 18

Index-G1 error

I am using some functions from package clusterSim to evaluate the best clusters layout. Here is the features vector I am using to cluater 12 signals: > alpha.vec [1] 0.8540039 0.8558350 0.8006592 0.8066406 0.8322754 0.8991699 0.8212891 [8] 0.8815918 0.9050293 0.9174194 0.8613281 0.8425293 In the following I pasted an excerpt of my program:

cclust causes R to crash when using manhattan kmeans

2006 Apr 07

cclust causes R to crash when using manhattan kmeans

Dear R users, When I run the following code, R crashes: require(cclust) x <- matrix(c(0,0,0,1.5,1,-1), ncol=2, byrow=TRUE) cclust(x, centers=x[2:3,], dist="manhattan", method="kmeans") While this works: cclust(x, centers=x[2:3,], dist="euclidean", method="kmeans") I'm posting this here because I am not sure if it is a bug. I've been searching

simple randomization question: How to perform "sample" in chunks

2009 Aug 20

simple randomization question: How to perform "sample" in chunks

Hello dear R-help group. My task looks simple, but I can't seem to find a "smart" (e.g: non loop) solution to it. Task: I wish to randomize a data.frame by one column, while keeping the inner-order in the second column as is. So for example, let's say I have the following data.frame: xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , b =

similar to: [cluster package question] What is the "sum of the dissimilarities" in the pam command ?