Displaying 20 results from an estimated 2000 matches similar to: "Using kmeans given cluster centroids and data with NAs"
2008 Jul 03
1
Otpmial initial centroid in kmeans
Helo there. I am using kmeans of base package to cluster my customers. As
the results of kmeans is dependent on the initial centroid, may I know:
1) how can we specify the centroid in the R function? (I don't want random
starting pt)
2) how to determine the optimal (if not, a good) centroid to start with? (I
am not after the fixed seed solution as it only ensure that the
2004 May 28
6
distance in the function kmeans
Hi,
I want to know which distance is using in the function kmeans
and if we can change this distance.
Indeed, in the function pam, we can put a distance matrix in
parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but
we can't do it in the function kmeans, we have to put the
matrix of data directly ...
Thanks in advance,
Nicolas BOUGET
2006 Jul 09
2
distance in kmeans algorithm?
Hello.
Is it possible to choose the distance in the kmeans algorithm?
I have m vectors of n components and I want to cluster them using kmeans
algorithm but I want to use the Mahalanobis distance or another distance.
How can I do it in R?
If I use kmeans, I have no option to choose the distance.
Thanks in advance,
Arnau.
2006 Aug 07
5
kmeans and incom,plete distance matrix concern
Hi there
I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created
i.e:
[
mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2),
dimnames = list(levels(DF$V1), levels(DF$V2)))
mat[cbind(DF$V1, DF$V2)] <- DF$V3
This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering.
My query
2016 Aug 19
2
KMeans - Evaluation Results
On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:
> I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class.
I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a
2016 Aug 18
3
KMeans - Evaluation Results
>
>
>
> Actually, you're doing something slightly unusual there: making the
> internal member public. Protected would be better, and private is I think
> most usual; library clients aren't going to have access to the Internal
> class declaration, so they can't call things on it. This means it's
> actually difficult right now to subclass Feature.
>
> I
2016 Aug 15
2
KMeans - Evaluation Results
Hello,
I've recently finished with an implementation of KMeans with two
initialization techniques, random initialization and KMeans++. I would like
to share my findings after evaluating the same.
I have tested this implementation of KMeans with a BBC news article
dataset. I am currently working on evaluating the same with FIRE datasets.
Currently, clustering more than 500 documents
2011 Feb 01
1
kmeans: number of cluster centres must lie between 1 and nrow(x)
Dear R,
Can't I cluster a dataset into k clusters where k is exactly the number of
observations? I have version 12.2 installed. See this example
> a <- matrix(1:100, 20)
> kmeans(a, 20)
Error: number of cluster centres must lie between 1 and nrow(x)
This is a bit ad-hoc but I known R from version 2.12 allows number of
clusters to be one. So I guess allowing number of clusters to be
2003 Jun 03
1
kmeans
Dear helpers
I was working with kmeans from package mva and found some strange situations. When I run several times the kmeans algorithm with the same dataset I get the same partition. I simulated a little example with 6 observations and run kmeans giving the centers and making just one iteration. I expected that the algorithm just allocated the observations to the nearest center but think this
2003 Nov 10
1
kmeans error (bug?)
Hello,
I have been getting the following intermittent error from kmeans:
>str(cavint.p.r)
num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:1967] "6" "49" "87" "102" ...
..$ : chr [1:13] "HYD" "NEG" "POS" "OXY" ...
> set.seed(34)
>
2007 Dec 05
1
Information criteria for kmeans
Hello,
how is, for example, the Schwarz criterion is defined for kmeans? It should
be something like:
k <- 2
vars <- 4
nobs <- 100
dat <- rbind(matrix(rnorm(nobs, sd = 0.3), ncol = vars),
matrix(rnorm(nobs, mean = 1, sd = 0.3), ncol = vars))
colnames(dat) <- paste("var",1:4)
(cl <- kmeans(dat, k))
schwarz <- sum(cl$withinss)+ vars*k*log(nobs)
Thanks
2006 Mar 25
2
pairwise combinatons of variables
Dear WizaRds,
although this might be a trivial question to the community, I was unable to
find anything solving my problem in the help files on CRAN. Please help.
Suppose I have 4 variables and want to use all possible combinations:
1,2
1,3
1,4
2,3
2,4
3,4
for a further kmeans partitioning.
I tried permutations() of package e1071, but this is not what I need. Thank you
for your help and
2013 May 21
1
keep the centre fixed in K-means clustering
Dear R users
I have the matrix of the centres of some clusters, e.g. 20 clusters each
with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
values.
I have collected new data (each with 100 numeric values) and would like to
keep the above 20 centres fixed/'unmoved' whilst just see how my new data
fit in this grouping system, e.g. if the data is close to cluster 1
2016 Jul 26
3
K MEANS clustering
Hello,
I've been working on the KMeans clustering algorithm recently and since the
past week, I have been stuck on a problem which I'm not able to find a
solution to.
Since we are representing documents as Tf-idf vectors, they are really
sparse vectors (a usual corpus can have around 5000 terms). So it gets
really difficult to represent these sparse vectors in a way that would be
2011 Apr 06
2
Help in kmeans
Hi All,
I was using the following command for performing kmeans for Iris dataset.
Kmeans_model<-kmeans(dataFrame[,c(1,2,3,4)],centers=3)
This was giving proper results for me. But, in my application we generate
the R commands dynamically and there was a requirement that the column names
will be sent instead of column indices to the R commands.Hence, to
incorporate this, i tried using the R
2012 Feb 27
2
kmeans: how to retrieve clusters
Hello,
I'd like to classify data with kmeans algorithm. In my case, I should get 2
clusters in output. Here is my data
colCandInd colCandMed
1 82 2950.5
2 83 1831.5
3 1192 2899.0
4 1193 2103.5
The first cluster is the two first lines
the 2nd cluster is the two last lines
Here is the code:
x = colCandList$colCandInd
y = colCandList$colCandMed
m = matrix(c(x, y),
2016 Aug 17
2
KMeans - Evaluation Results
On Wed, Aug 17, 2016 at 7:23 PM, James Aylett <james-xapian at tartarus.org>
wrote:
> >> How long does 200?300 documents take to cluster? How does it grow as
> more documents are included in the MSet? We'd expect an MSet of 1000
> documents to take longer to cluster than one with 100, but the important
> thing is _how_ the time increases as the number of documents
2003 Jun 05
1
kmeans (again)
Regarding a previous question concerning the kmeans function I've tried the
same example and I also get a strange result (at least according to what is
said in the help of the function kmeans). Apparently, the function is
disregarding the initial cluster centers one gives it. According to the help
of the function:
centers: Either the number of clusters or a set of initial cluster
2003 Aug 10
3
high memory allocation
Hello,
I have trouble with my cluster analysis using package "cluster". "diana" and "agnes" both seem to try to allocate memory directly, so I can not use virtual memory of my Windows2000 operation system.
I do have 320 MB of memory. But they claim about 600 MB. Do I have a chance to do the analysis with my amount of memory.
Thanks for all comments, I did not find a
2006 Mar 23
0
kmeans Clustering
Dear WizaRds,
My goal is to program the VS-KM algorithm by Brusco and Cradit 01 and I have
come to a complete stop in my efforts. Maybe anybody is willing to follow my
thoughts and offer some help.
In a first step, I want to use a single variable for the partitioning process.
As the center-matrix I use the objects that belong to the cluster I found with
the hierarchial Ward algorithm. Then,