Displaying 20 results from an estimated 8000 matches similar to: "[cluster package question] What is the "sum of the dissimilarities" in the pam command ?"
2008 Dec 17
1
bug (?!) in "pam()" clustering from fpc package ?
Hello all.
I wish to run k-means with "manhattan" distance.
Since this is not supported by the function "kmeans", I turned to the "pam"
function in the "fpc" package.
Yet, when I tried to have the algorithm run with different starting points,
I found that pam ignores and keep on starting the algorithm from the same
starting-points (medoids).
For my
2006 Apr 10
2
passing known medoids to clara() in the cluster package
Greetings,
I have had good success using the clara() function to perform a simple cluster
analysis on a large dataset (1 million+ records with 9 variables).
Since the clara function is a wrapper to pam(), which will accept known medoid
data - I am wondering if this too is possible with clara() ... The
documentation does not suggest that this is possible.
Essentially I am trying to
2008 Feb 22
2
Looping and Pasting
Hello R-community: Much of the time I want to use loops to look at graphs,
etc. For example,
I have 25 plots, for which the names are m.1$medoids, m.2$medoids, ...,
m.25$medoids.
I want to index the object number (1:25) as below (just to show concept).
for (i in 1:25){
plot(m.i$medoids)
}
I've tried the following, with negative results
for ...
2015 Apr 29
2
cantidad de datos
Hola.
Yo en vez de utilizar análisis cluster que impliquen distancias,
probaría con un kmedias o con un pam (partition around medoids) pero
utilizando muestras, la función clara de la librería cluster puede
ayudarte. Pego el details de la ayuda de 'clara'
Details
clara is fully described in chapter 3 of Kaufman and Rousseeuw (1990).
Compared to other partitioning methods such as pam,
2011 May 16
1
pam() clustering for large data sets
Hello everyone,
I need to do k-medoids clustering for data which consists of 50,000
observations. I have computed distances between the observations
separately and tried to use those with pam().
I got the "cannot allocate vector of length" error and I realize this
job is too memory intensive. I am at a bit of a loss on what to do at
this point.
I can't use clara(), because I
2010 Dec 27
1
Any functions to manipulate (merge, cut, remove) hclust objects? (maybe through phylo?)
Hello all,
I'm now working with hclust objects and was hoping to perform some basic
editing on them like:
- Joining = the merging of two hclust objects (so they will share one
root)
- Splicing = So to cut/extract a branch out of an hclust object - that by
itself will be an hclust object.
I noticed I could extract one element of an hclust object by turning it into
a dendrogram,
2015 Apr 29
2
cantidad de datos
El inconveniente con un K-medias, es que se tiene que se tiene que pre definir el número de segmentos, pero eso es algo con lo q no cuento. La solución de Javier me parece q sería la única opción.
Atte.
Ricardo Alva Valiente
-----Mensaje original-----
De: R-help-es [mailto:r-help-es-bounces en r-project.org] En nombre de javier.ruben.marcuzzi en gmail.com
Enviado el: miércoles, 29 de abril de
2004 Jun 29
1
give PAM my own medoids
Hello,
When using PAM (partitioning around medoids), I would like to skip the
build-step and give the fonction my own medoids.
Do you know if it is possible, and how ?
Thank you very much.
Isabel
2005 Sep 12
4
Document clustering for R
I'm working on a project related to document clustering. I know that R
has clustering algorithms such as clara, but only supports two distance
metrics: euclidian and manhattan, which are not very useful for
clustering documents. I was wondering how easy it would be to extend the
clustering package in R to support other distance metrics, such as
cosine distance, or if there was an API for
2005 Jun 07
1
Specifying medoids in PAM?
I am using the PAM algorithm in the CLUSTER library.
When I allow PAM to seed the medoids using the default __build__
algorithm things work
well:
> pam(stats.table, metric="euclidean", stand=TRUE, k=5)
But I have some clusters from a Hierarchical analysis that I would
like to use as seeds for the PAM algorithm. I can't figure what the
mediod argument wants. When I put in the
2015 Apr 29
2
cantidad de datos
Buen aporte?excelente!!
Atte.
Ricardo Alva Valiente
De: Jose Luis Cañadas Reche [mailto:canadasreche en gmail.com]
Enviado el: miércoles, 29 de abril de 2015 12:51 PM
Para: Alva Valiente, Ricardo (RIAV); 'javier.ruben.marcuzzi en gmail.com'; R-help-es en r-project.org
Asunto: Re: [R-es] cantidad de datos
Podrías hacer varios kmedias con diferente número de clusters y comprobar como
2005 May 30
2
How to access to sum of dissimilarities in CLARA
Dear All ,
Since dissimilarity is one of quality measures in clustering , I'm trying to access to the sum of dissimilarity as a whole measure. But after running my data using CLARA I obtain :
1128 dissimilarities, summarized :
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.033155 0.934630 2.257000 2.941600 4.876600 8.943700
But I can not find the sum of dissimilarity.How can i
2007 Nov 28
2
Clustering
Hello all!
I am performingsome clustering analysis on microarray data using
agnes{cluster} and I have created my own dissimilarity matrix according to a
distance measure different from "euclidean" or "manhattan" etc. My question
is, if I choose for example method="complete", how are the distances
between the elements calculated? Are they taken form the dissimilarity
2009 Jul 26
1
Is there an R implementation for the "Barnard's exact test" (a substitute for fisher.test) ?
Hello R help members. I came across today with an article on Barnard's exact
test (http://www.cytel.com/Papers/twobinomials.pdf), that is supposed to
give a more powerful fisher.test - Because it doesn't assume that we know
the row and column totals are in advance. Any pointers to such a function ?
Thanks, Tal
--
----------------------------------------------
My contact information:
2009 Jul 01
1
Are there any bloggers amoung us going to useR 2009 ?
*(note*: This is an R community question, not a statistical nor coding
question. Since this is my first time writing such a post, I hope no one
will take offence of it.)
Hello all,
I will be attending useR 2009 next week, and was wondering if there are any
of you who are *bloggers *intending to participate and report on useR 2009?
If so - I would love to know your blogs URL so as to follow you.
2009 Feb 21
1
variable/model selction (step/stepAIC) for biglm ?
Hello dear R mailing list members.
I have recently became curious of the possibility applying model
selection algorithms (even as simple as AIC) to regressions of large
datasets. I searched as best as I could, but couldn't find any
reference or wrapper for using step or stepAIC to packages such as
biglm.
Any ideas or directions of how to implement such a concept ?
Best,
Tal
--
2009 Aug 10
3
Bug in "seq" (or a "feature") ?
(I use R 2.9.1 with win XP)
If I run this code:
seq(-0.1,.9, by = .05)[seq(-0.1,.9, by = .05) <= 0.5]
I get this output:
[1] -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Why is 0.50 not in the results ?
(It seems that it gives a slightly bigger number then 0.5 but I don't
understand why it does that)
Where as if I try:
seq(-0.1,.9, by = .05)[seq(-0.1,.9, by = .05) <=
2009 Feb 18
0
Index-G1 error
I am using some functions from package clusterSim to evaluate the best clusters layout.
Here is the features vector I am using to cluater 12 signals:
> alpha.vec
[1] 0.8540039 0.8558350 0.8006592 0.8066406 0.8322754 0.8991699 0.8212891
[8] 0.8815918 0.9050293 0.9174194 0.8613281 0.8425293
In the following I pasted an excerpt of my program:
2006 Apr 07
2
cclust causes R to crash when using manhattan kmeans
Dear R users,
When I run the following code, R crashes:
require(cclust)
x <- matrix(c(0,0,0,1.5,1,-1), ncol=2, byrow=TRUE)
cclust(x, centers=x[2:3,], dist="manhattan", method="kmeans")
While this works:
cclust(x, centers=x[2:3,], dist="euclidean", method="kmeans")
I'm posting this here because I am not sure if it is a bug.
I've been searching
2009 Aug 20
4
simple randomization question: How to perform "sample" in chunks
Hello dear R-help group.
My task looks simple, but I can't seem to find a "smart" (e.g: non loop)
solution to it.
Task: I wish to randomize a data.frame by one column, while keeping the
inner-order in the second column as is.
So for example, let's say I have the following data.frame:
xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) ,
b =