thr3ads.net - similar to: "Using kmeans given cluster centroids and data with NAs"

Displaying 20 results from an estimated 2000 matches similar to: "Using kmeans given cluster centroids and data with NAs"

Otpmial initial centroid in kmeans

2008 Jul 03

Otpmial initial centroid in kmeans

Helo there. I am using kmeans of base package to cluster my customers. As the results of kmeans is dependent on the initial centroid, may I know: 1) how can we specify the centroid in the R function? (I don't want random starting pt) 2) how to determine the optimal (if not, a good) centroid to start with? (I am not after the fixed seed solution as it only ensure that the

distance in the function kmeans

2004 May 28

distance in the function kmeans

Hi, I want to know which distance is using in the function kmeans and if we can change this distance. Indeed, in the function pam, we can put a distance matrix in parameter (by the line "pam<-pam(dist(matrixdata),k=7)" ) but we can't do it in the function kmeans, we have to put the matrix of data directly ... Thanks in advance, Nicolas BOUGET

distance in kmeans algorithm?

2006 Jul 09

distance in kmeans algorithm?

Hello. Is it possible to choose the distance in the kmeans algorithm? I have m vectors of n components and I want to cluster them using kmeans algorithm but I want to use the Mahalanobis distance or another distance. How can I do it in R? If I use kmeans, I have no option to choose the distance. Thanks in advance, Arnau.

kmeans and incom,plete distance matrix concern

2006 Aug 07

kmeans and incom,plete distance matrix concern

Hi there I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created i.e: [ mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2), dimnames = list(levels(DF$V1), levels(DF$V2))) mat[cbind(DF$V1, DF$V2)] <- DF$V3 This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering. My query

KMeans - Evaluation Results

2016 Aug 19

KMeans - Evaluation Results

On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote: > I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a

KMeans - Evaluation Results

2016 Aug 18

KMeans - Evaluation Results

> > > > Actually, you're doing something slightly unusual there: making the > internal member public. Protected would be better, and private is I think > most usual; library clients aren't going to have access to the Internal > class declaration, so they can't call things on it. This means it's > actually difficult right now to subclass Feature. > > I

KMeans - Evaluation Results

2016 Aug 15

KMeans - Evaluation Results

Hello, I've recently finished with an implementation of KMeans with two initialization techniques, random initialization and KMeans++. I would like to share my findings after evaluating the same. I have tested this implementation of KMeans with a BBC news article dataset. I am currently working on evaluating the same with FIRE datasets. Currently, clustering more than 500 documents

kmeans: number of cluster centres must lie between 1 and nrow(x)

2011 Feb 01

kmeans: number of cluster centres must lie between 1 and nrow(x)

Dear R, Can't I cluster a dataset into k clusters where k is exactly the number of observations? I have version 12.2 installed. See this example > a <- matrix(1:100, 20) > kmeans(a, 20) Error: number of cluster centres must lie between 1 and nrow(x) This is a bit ad-hoc but I known R from version 2.12 allows number of clusters to be one. So I guess allowing number of clusters to be

kmeans

2003 Jun 03

kmeans

Dear helpers I was working with kmeans from package mva and found some strange situations. When I run several times the kmeans algorithm with the same dataset I get the same partition. I simulated a little example with 6 observations and run kmeans giving the centers and making just one iteration. I expected that the algorithm just allocated the observations to the nearest center but think this

kmeans error (bug?)

2003 Nov 10

kmeans error (bug?)

Hello, I have been getting the following intermittent error from kmeans: >str(cavint.p.r) num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:1967] "6" "49" "87" "102" ... ..$ : chr [1:13] "HYD" "NEG" "POS" "OXY" ... > set.seed(34) >

Information criteria for kmeans

2007 Dec 05

Information criteria for kmeans

Hello, how is, for example, the Schwarz criterion is defined for kmeans? It should be something like: k <- 2 vars <- 4 nobs <- 100 dat <- rbind(matrix(rnorm(nobs, sd = 0.3), ncol = vars), matrix(rnorm(nobs, mean = 1, sd = 0.3), ncol = vars)) colnames(dat) <- paste("var",1:4) (cl <- kmeans(dat, k)) schwarz <- sum(cl$withinss)+ vars*k*log(nobs) Thanks

pairwise combinatons of variables

2006 Mar 25

pairwise combinatons of variables

Dear WizaRds, although this might be a trivial question to the community, I was unable to find anything solving my problem in the help files on CRAN. Please help. Suppose I have 4 variables and want to use all possible combinations: 1,2 1,3 1,4 2,3 2,4 3,4 for a further kmeans partitioning. I tried permutations() of package e1071, but this is not what I need. Thank you for your help and

keep the centre fixed in K-means clustering

2013 May 21

keep the centre fixed in K-means clustering

Dear R users I have the matrix of the centres of some clusters, e.g. 20 clusters each with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric values. I have collected new data (each with 100 numeric values) and would like to keep the above 20 centres fixed/'unmoved' whilst just see how my new data fit in this grouping system, e.g. if the data is close to cluster 1

K MEANS clustering

2016 Jul 26

K MEANS clustering

Hello, I've been working on the KMeans clustering algorithm recently and since the past week, I have been stuck on a problem which I'm not able to find a solution to. Since we are representing documents as Tf-idf vectors, they are really sparse vectors (a usual corpus can have around 5000 terms). So it gets really difficult to represent these sparse vectors in a way that would be

Help in kmeans

2011 Apr 06

Help in kmeans

Hi All, I was using the following command for performing kmeans for Iris dataset. Kmeans_model<-kmeans(dataFrame[,c(1,2,3,4)],centers=3) This was giving proper results for me. But, in my application we generate the R commands dynamically and there was a requirement that the column names will be sent instead of column indices to the R commands.Hence, to incorporate this, i tried using the R

kmeans: how to retrieve clusters

2012 Feb 27

kmeans: how to retrieve clusters

Hello, I'd like to classify data with kmeans algorithm. In my case, I should get 2 clusters in output. Here is my data colCandInd colCandMed 1 82 2950.5 2 83 1831.5 3 1192 2899.0 4 1193 2103.5 The first cluster is the two first lines the 2nd cluster is the two last lines Here is the code: x = colCandList$colCandInd y = colCandList$colCandMed m = matrix(c(x, y),

KMeans - Evaluation Results

2016 Aug 17

KMeans - Evaluation Results

On Wed, Aug 17, 2016 at 7:23 PM, James Aylett <james-xapian at tartarus.org> wrote: > >> How long does 200?300 documents take to cluster? How does it grow as > more documents are included in the MSet? We'd expect an MSet of 1000 > documents to take longer to cluster than one with 100, but the important > thing is _how_ the time increases as the number of documents

kmeans (again)

2003 Jun 05

kmeans (again)

Regarding a previous question concerning the kmeans function I've tried the same example and I also get a strange result (at least according to what is said in the help of the function kmeans). Apparently, the function is disregarding the initial cluster centers one gives it. According to the help of the function: centers: Either the number of clusters or a set of initial cluster

high memory allocation

2003 Aug 10

high memory allocation

Hello, I have trouble with my cluster analysis using package "cluster". "diana" and "agnes" both seem to try to allocate memory directly, so I can not use virtual memory of my Windows2000 operation system. I do have 320 MB of memory. But they claim about 600 MB. Do I have a chance to do the analysis with my amount of memory. Thanks for all comments, I did not find a

kmeans Clustering

2006 Mar 23

kmeans Clustering

Dear WizaRds, My goal is to program the VS-KM algorithm by Brusco and Cradit 01 and I have come to a complete stop in my efforts. Maybe anybody is willing to follow my thoughts and offer some help. In a first step, I want to use a single variable for the partitioning process. As the center-matrix I use the objects that belong to the cluster I found with the hierarchial Ward algorithm. Then,

similar to: Using kmeans given cluster centroids and data with NAs