Hi! I just started to use R recently, and would like to ask a help about automating the job. I need to use "kmeans" function with my own 300 data files, and wonder if it's possible to do it automatically. For example,> library (mva) > mydata <- read.table ("data1") > cl <- kmeans(mydata, 5, 20)and I just need to save "cl" info (i.e. the center info). Of course, I'll have the data from "data1" to "data300". Could someone please give me an advise how to do it, or any other recommendation for this job? I'll run this either in Unix or Mac environment using the version 1.2.1. Thanks in advance. - Youngser Park -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 28 Feb 2001, Youngser Park wrote:> Hi! > > I just started to use R recently, and would like to ask a help about > automating the job. > > I need to use "kmeans" function with my own 300 data files, and wonder if > it's possible to do it automatically. For example, > > > library (mva) > > mydata <- read.table ("data1") > > cl <- kmeans(mydata, 5, 20) > > and I just need to save "cl" info (i.e. the center info). Of course, I'll > have the data from "data1" to "data300". > > Could someone please give me an advise how to do it, or any other > recommendation for this job? I'll run this either in Unix or Mac environment > using the version 1.2.1.This should be pretty straightforward. library(mva) maxit <- 20 nclust <- 5 nvars <- 5 npts <- 100 ndata <- 300 ## cluster identity of each point, center of each cluster, size of each ## cluster result.len <- npts+nclust*nvars+nclust results <- matrix(ncol=result.len,nrow=ndata) for (i in 1:ndata) { mydata <- read.table(paste("data",i,sep="")) results[i,] <- unlist(kmeans(mydata,nclust,maxit)) } does that help?> > Thanks in advance. > > - Youngser Park > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- 318 Carr Hall bolker at zoo.ufl.edu Zoology Department, University of Florida http://www.zoo.ufl.edu/bolker Box 118525 (ph) 352-392-5697 Gainesville, FL 32611-8525 (fax) 352-392-3704 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
[Forwarding this to the help list so that it will be documented there] On Wed, 28 Feb 2001, Youngser Park wrote:> on 2/28/01 5:44 PM, ben at zoo.ufl.edu at ben at zoo.ufl.edu wrote: > > > library(mva) > > maxit <- 20 > > nclust <- 5 > > nvars <- 5 > > npts <- 100 > > ndata <- 300 > > ## cluster identity of each point, center of each cluster, size of each > > ## cluster > > result.len <- npts+nclust*nvars+nclust > > results <- matrix(ncol=result.len,nrow=ndata) > > for (i in 1:ndata) { > > mydata <- read.table(paste("data",i,sep="")) > > results[i,] <- unlist(kmeans(mydata,nclust,maxit)) > > }> > Ben, > > I just tried it, and got this message; > > > source("run2") > Error in "[<-"(*tmp*, i, , value = unlist(kmeans(mydata, nclust, maxit))) : > number of items to replace is not a multiple of replacement length > > I have no idea what this means.It means that, probably, the matrix had the wrong number of columns. R's default assignment will try to duplicate data to fill up the target structure when it is assigning, e.g.: z <- matrix(ncol=4,nrow=5) z[1,] <- 1:2 ## duplicates "1:2" twice to fill the row z[1,] <- 1:3 ## gives an error because the row length is not an even ## multiple of the length of the assignment data. The assignments of nclust, nvars, npts, ndata above were arbitrary, you need to make sure to fill them in with your own values (nclust=# of clusters; nvars=# of variables/dimensions in your data set; npts=# of points in your data set). I was actually assuming all your data sets were identical; if not, all the results vectors will be different lengths and it would be better to put your results in a list rather than a matrix as I have done above. However, given that you want to save the results of every analysis to a separate file, you don't need to worry about this. You're right that using paste() is the way to go to save the output to separate files. What you do will depend a bit on what format you want the output to be in, but you could do something like: for (i in 1:ndata) { mydata <- read.table(paste("data",i,sep="")) cl <- unlist(kmeans(mydata,nclust,maxit)) sink(paste("out",formatC(i,width=3,flag="0"),sep="") print(cl) } sink()> Also, could you please tell me how to save this output into a file in a > similar way, e.g. "out001", "out002", and etc.? I guess I can use "paste" > function. >Just one more silly question: are you really going to look at 300 separate output files? Or are you going to read them into something else for analysis? Wouldn't it be better to save the results in one big structure in R so you could run analyses comparing them? Ben -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._