Hello, I am hoping you can help me with a question concerning kmeans clustering in R. I am working with the following data-set (abbreviated): BMW Ford Infiniti Jeep Lexus Chrysler Mercedes Saab Porsche Volvo [1,] 6 8 2 8 4 5 4 4 7 7 [2,] 8 7 4 6 4 1 6 7 8 5 [3,] 8 2 4 6 3 2 7 4 4 4 [4,] 7 4 4 6 6 1 6 3 5 5 [5,] 6 2 4 5 5 1 3 3 6 3 [6,] 6 7 3 6 5 1 8 4 8 2 [7,] 1 6 6 7 5 2 6 6 5 6 [8,] 3 6 6 4 5 1 4 2 1 1 [9,] 6 7 5 8 4 1 6 6 8 5 [10,] 6 7 5 9 3 1 2 5 1 8 When I try to scale my data and perform kmeans clustering, I get the following errors: new <- scale(new) Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric> cl <- kmeans(new, 4)Error in switch(nmeth, { : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In switch(nmeth, { : NAs introduced by coercion This is confusing to me since all of the data is numeric and there are no missing values. Is there something I need to do to my data to prepare it for kmeans? I have tried many matrix transformations but nothing has worked so far. Your help is much appreciated. Thanks, jordan -- Jordan van Rijn vanrijn9 at fastmail.fm
On 9 May 2008, at 09:12, Jordan van Rijn wrote:> Hello, > > I am hoping you can help me with a question concerning kmeans > clustering > in R. I am working with the following data-set (abbreviated): > > > BMW Ford Infiniti Jeep Lexus Chrysler Mercedes Saab Porsche > Volvo > [1,] 6 8 2 8 4 5 4 4 > 7 7 > [2,] 8 7 4 6 4 1 6 7 > 8 5 > [3,] 8 2 4 6 3 2 7 4 > 4 4 > [4,] 7 4 4 6 6 1 6 3 > 5 5 > [5,] 6 2 4 5 5 1 3 3 > 6 3 > [6,] 6 7 3 6 5 1 8 4 > 8 2 > [7,] 1 6 6 7 5 2 6 6 > 5 6 > [8,] 3 6 6 4 5 1 4 2 > 1 1 > [9,] 6 7 5 8 4 1 6 6 > 8 5 > [10,] 6 7 5 9 3 1 2 5 > 1 8 > > When I try to scale my data and perform kmeans clustering, I get the > following errors: > new <- scale(new) > Error in colMeans(x, na.rm = TRUE) : 'x' must be numericProbably the data is stored as factor instead of numeric. Try coercing by as.numeric(new) hth, Ingmar>> cl <- kmeans(new, 4) > Error in switch(nmeth, { : NA/NaN/Inf in foreign function call (arg 1) > In addition: Warning message: > In switch(nmeth, { : NAs introduced by coercion > > This is confusing to me since all of the data is numeric and there are > no missing values. Is there something I need to do to my data to > prepare > it for kmeans? I have tried many matrix transformations but nothing > has > worked so far. > > Your help is much appreciated. > > Thanks, > jordan > > -- > Jordan van Rijn > vanrijn9@fastmail.fm > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.Ingmar Visser Department of Psychology, University of Amsterdam Roetersstraat 15 1018 WB Amsterdam The Netherlands t: +31-20-5256723 [[alternative HTML version deleted]]
Unfortunately, your data is *not* numeric. That is what the first error message, " 'x' must be numeric", is telling you, and you should believe it. It might look numeric, but it isn't, which is why Ingmar mentioned you might have factors instead of numbers. Your challenge is to discover why. The "why" will depend on how you brought the data into R. Assuming 'new' is a matrix (which it appears to be), here are some ways to find out more about your data object: is.numeric(new) is.factor(new) class(new) mode(new) str(new) I'd suggest taking another look at your input data and making very sure there are only numbers in it. If it was a text file you read into R with some function, inspect the text file carefully. Also, check the help pages for the method you used to load the data into R, and see if you can find out what kinds of things cause data to be interpreted as other than numeric. -Don At 12:12 AM -0700 5/9/08, Jordan van Rijn wrote:>Hello, > >I am hoping you can help me with a question concerning kmeans clustering >in R. I am working with the following data-set (abbreviated): > > > BMW Ford Infiniti Jeep Lexus Chrysler Mercedes Saab Porsche > Volvo > [1,] 6 8 2 8 4 5 4 4 7 7 > [2,] 8 7 4 6 4 1 6 7 8 5 > [3,] 8 2 4 6 3 2 7 4 4 4 > [4,] 7 4 4 6 6 1 6 3 5 5 > [5,] 6 2 4 5 5 1 3 3 6 3 > [6,] 6 7 3 6 5 1 8 4 8 2 > [7,] 1 6 6 7 5 2 6 6 5 6 > [8,] 3 6 6 4 5 1 4 2 1 1 > [9,] 6 7 5 8 4 1 6 6 8 5 > [10,] 6 7 5 9 3 1 2 5 1 8 > >When I try to scale my data and perform kmeans clustering, I get the >following errors: > new <- scale(new) >Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric >> cl <- kmeans(new, 4) >Error in switch(nmeth, { : NA/NaN/Inf in foreign function call (arg 1) >In addition: Warning message: >In switch(nmeth, { : NAs introduced by coercion > >This is confusing to me since all of the data is numeric and there are >no missing values. Is there something I need to do to my data to prepare >it for kmeans? I have tried many matrix transformations but nothing has >worked so far. > >Your help is much appreciated. > >Thanks, > jordan > >-- > Jordan van Rijn > vanrijn9 at fastmail.fm > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062