thr3ads.net - similar to: "kmeans.big.matrix"

Displaying 20 results from an estimated 2000 matches similar to: "kmeans.big.matrix"

2009 Jul 18

Building a big.matrix using foreach

Hi there! I have become a big fan of the 'foreach' package allowing me to do a lot of stuff in parallel. For example, evaluating the function f on all elements in a vector x is easily accomplished: foreach(i=1:length(x),.combine=c) %dopar% f(x[i]) Here the .combine=c option tells foreach to combine output using the c()-function. That is, to return it as a vector. Today I discovered the

Sparse KMeans/KDE/Nearest Neighbors?

2010 Feb 24

Sparse KMeans/KDE/Nearest Neighbors?

hi, I have a dataset (the netflix dataset) which is basically ~18k columns and well variable number of rows but let's assume 25 thousand for now. The dataset is very sparse. I was wondering how to do kmeans/nearest neighbors or kernel density estimation on it. I tired using the spMatrix function in "Matrix" package. I think I'm able to create the matrix but as soon as I pass

kmeans clustering on large but sparse matrix

2012 Jan 18

kmeans clustering on large but sparse matrix

Hi, I have a 60k*600k matrix, which exceed the vector length limit of 2^32-1. But it's rather sparse, only 0.02% has value. So I save is as MarketMatrix (mm) file, it's about 300M in size. I use readMM in Matrix package to read it in. If do so, the data type becomes dgTMatrix in 'Matrix' package instead of the common matrix type. The problem is, if I run k-means only on part of

bigmemory - extracting submatrix from big.matrix object

2009 Jun 02

bigmemory - extracting submatrix from big.matrix object

I am using the library(bigmemory) to handle large datasets, say 1 GB, and facing following problems. Any hints from anybody can be helpful. _Problem-1: _ I am using "read.big.matrix" function to create a filebacked big matrix of my data and get the following warning: > x = read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile

variación en los resultados de k medias (Alfredo Alvarez)

2013 Jul 26

variación en los resultados de k medias (Alfredo Alvarez)

Buen día, no sé si estoy utilizando bien la lista, es la primera vez. Si lo hago mal me corrigen por favor. Sobre tu comentario Pedro, muchas gracias. Lo qeu entiendo con tu sugerencia de set.seed es qeu de esa forma fijas los resultados, pero no estoy seguro si otra agrupación funcione mejor. Es decir me interesa un método de agrupación que genere la "mejor" agrupación y como los

[Fwd: adding more columns in big.matrix object of bigmemory package]

2010 Dec 17

[Fwd: adding more columns in big.matrix object of bigmemory package]

Hi, With reference to the mail below, I have large datasets, coming from various different sources, which I can read into filebacked big.matrix using library bigmemory. I want to merge them all into one 'big.matrix' object. (Later, I want to run regression using library 'biglm'). I am unsuccessfully trying to do this from quite some time now. Can you please

efficient coding with foreach and bigmemory

2011 Sep 29

efficient coding with foreach and bigmemory

I recently learned about the bigmemory and foreach packages and am trying to use them to help me create a very large matrix. Without those packages, I can create the type of matrix that I want with 10 columns and 5e6 rows. I would like to be able to scale up to 5e9 rows, or more, if possible. I have created a simplified example of what I'm trying to do, below. The first part of the

bigmemory package woes

2010 Apr 23

bigmemory package woes

I have pretty big data sizes, like matrices of .5 to 1.5GB so once i need to juggle several of them i am in need of disk cache. I am trying to use bigmemory package but getting problems that are hard to understand. I am getting seg faults and machine just hanging. I work by the way on Red Hat Linux, 64 bit R version 10. Simplest problem is just saving matrices. When i do something like

Help in kmeans

2011 Apr 06

Help in kmeans

Hi All, I was using the following command for performing kmeans for Iris dataset. Kmeans_model<-kmeans(dataFrame[,c(1,2,3,4)],centers=3) This was giving proper results for me. But, in my application we generate the R commands dynamically and there was a requirement that the column names will be sent instead of column indices to the R commands.Hence, to incorporate this, i tried using the R

kmeans (again)

2003 Jun 05

kmeans (again)

Regarding a previous question concerning the kmeans function I've tried the same example and I also get a strange result (at least according to what is said in the help of the function kmeans). Apparently, the function is disregarding the initial cluster centers one gives it. According to the help of the function: centers: Either the number of clusters or a set of initial cluster

Empty cluster / segfault using vanilla kmeans with version 2.15.2

2013 Mar 13

Empty cluster / segfault using vanilla kmeans with version 2.15.2

Hello, here is a working reproducible example which crashes R using kmeans or gives empty clusters using the nstart option with R 15.2. library(cluster) kmeans(ruspini,4) kmeans(ruspini,4,nstart=2) kmeans(ruspini,4,nstart=4) kmeans(ruspini,4,nstart=10) ?kmeans either we got empty always clusters and or, after some further commands an segfault. regards, Detlef Groth ------------ [R] Empty

AW: Probleme with Kmeans...

2004 May 11

AW: Probleme with Kmeans...

Sorry, to solve your question I had tried: data(faithful) kmeans(faithful[c(1:20),1],10) Error: empty cluster: try a better set of initial centers But when I run this a second time it will be ok. It seems, that kmeans has problems to initialize good starting points, because of the random choose of these starting initial points. With kmeans(data,k,centers=c(...) the problem can be solved.

distance in kmeans algorithm?

2006 Jul 09

distance in kmeans algorithm?

Hello. Is it possible to choose the distance in the kmeans algorithm? I have m vectors of n components and I want to cluster them using kmeans algorithm but I want to use the Mahalanobis distance or another distance. How can I do it in R? If I use kmeans, I have no option to choose the distance. Thanks in advance, Arnau.

kmeans clustering

2003 Apr 14

kmeans clustering

Hi, I am using kmeans to cluster a dataset. I test this example: > data<-matrix(scan("data100.txt"),100,37,byrow=T) (my dataset is 100 rows and 37 columns--clustering rows) > c1<-kmeans(data,3,20) > c1 $cluster [1] 1 1 1 1 1 1 1 3 3 3 1 3 1 3 3 1 1 1 1 3 1 3 3 1 1 1 3 3 1 1 3 1 1 1 1 3 3 [38] 3 1 1 1 3 1 1 1 1 3 3 3 1 1 1 1 1 1 3 1 3 1 1 3 1 1 1 1 3 1 1 1 1 1 1 3

cclust causes R to crash when using manhattan kmeans

2006 Apr 07

cclust causes R to crash when using manhattan kmeans

Dear R users, When I run the following code, R crashes: require(cclust) x <- matrix(c(0,0,0,1.5,1,-1), ncol=2, byrow=TRUE) cclust(x, centers=x[2:3,], dist="manhattan", method="kmeans") While this works: cclust(x, centers=x[2:3,], dist="euclidean", method="kmeans") I'm posting this here because I am not sure if it is a bug. I've been searching

kmeans: how to retrieve clusters

2012 Feb 27

kmeans: how to retrieve clusters

Hello, I'd like to classify data with kmeans algorithm. In my case, I should get 2 clusters in output. Here is my data colCandInd colCandMed 1 82 2950.5 2 83 1831.5 3 1192 2899.0 4 1193 2103.5 The first cluster is the two first lines the 2nd cluster is the two last lines Here is the code: x = colCandList$colCandInd y = colCandList$colCandMed m = matrix(c(x, y),

Kmeans again

2003 Jun 06

Kmeans again

Dear helpers I'm sorry to insist but I still think there is something wrong with the function kmeans. For instance, let's try the same small example: > dados<-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2) I will choose observations 3 and 4 for initial centers and just one iteration. The results are > A<-kmeans(dados,dados[c(3,4),],1) > A $cluster [1] 1 1 1 1 2 2 $centers

KMEANS output...

2005 Jun 14

KMEANS output...

Using R 2.1.0 on Windows 2 questions: 1. Is there a way to parse the output from kmeans within R? 2. If the answer to 1. is convoluted or impossible, how do you save the output from kmeans in a plain text file for further processing outside R? Example: > ktx<-kmeans(x,12, nstart = 200) I would like to parse ktx within R to extract cluster sizes, sum-of-squares values, etc., OR save ktx in

custom metric for dist for use with hclust/kmeans

2010 May 05

custom metric for dist for use with hclust/kmeans

Hi guys, I've been using the kmeans and hclust functions for some time now and was wondering if I could specify a custom metric when passing my data frame into hclust as a distance matrix. Actually, kmeans doesn't even take a distance matrix; it takes the data frame directly. I was wondering if there's a way or if there's a package that lets you create distance matrices from

kmeans

2003 Jun 03

kmeans

Dear helpers I was working with kmeans from package mva and found some strange situations. When I run several times the kmeans algorithm with the same dataset I get the same partition. I simulated a little example with 6 observations and run kmeans giving the centers and making just one iteration. I expected that the algorithm just allocated the observations to the nearest center but think this

similar to: kmeans.big.matrix