Gundala Viswanath
2013-Dec-07 14:28 UTC
[R] How to perform clustering without removing rows where NA is present in R
I have a data which contain some NA value in their elements. What I want to do is to **perform clustering without removing rows** where the NA is present. I understand that `gower` distance measure in `daisy` allow such situation. But why my code below doesn't work? __BEGIN__ # plot heat map with dendogram together. library("gplots") library("cluster") # Arbitrarily assigning NA to some elements mtcars[2,2] <- "NA" mtcars[6,7] <- "NA" mydata <- mtcars hclustfunc <- function(x) hclust(x, method="complete") # Initially I wanted to use this but it didn't take NA #distfunc <- function(x) dist(x,method="euclidean") # Try using daisy GOWER function # which suppose to work with NA value distfunc <- function(x) daisy(x,metric="gower") d <- distfunc(mydata) fit <- hclustfunc(d) # Perform clustering heatmap heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc); __END__ The error message I got is this: Error in which(is.na) : argument to 'which' is not logical Calls: distfunc.g -> daisy In addition: Warning messages: 1: In data.matrix(x) : NAs introduced by coercion 2: In data.matrix(x) : NAs introduced by coercion 3: In daisy(x, metric = "gower") : binary variable(s) 8, 9 treated as interval scaled Execution halted At the end of the day, I'd like to perform hierarchical clustering with the NA allowed data. G.V.
Sarah Goslee
2013-Dec-09 20:49 UTC
[R] How to perform clustering without removing rows where NA is present in R
Though your second question, restating this, has already been answered, it might be worth you taking another look at your code in this one as well. In particular note that NA and "NA" are NOT the same thing. data(mtcars) str(mtcars) # from your code mtcars[2,2] <- "NA" mtcars[6,7] <- "NA" str(mtcars) I'm pretty sure that's not what you want. Thanks for providing a reproducible example: otherwise it would have been impossible to catch this. If you run into unexpected errors, it's always a good plan to start by using str() and similar functions to check whether your data are as you intend. Sarah On Sat, Dec 7, 2013 at 9:28 AM, Gundala Viswanath <gundalav at gmail.com> wrote:> I have a data which contain some NA value in their elements. > What I want to do is to **perform clustering without removing rows** > where the NA is present. > > I understand that `gower` distance measure in `daisy` allow such situation. > But why my code below doesn't work? > > __BEGIN__ > # plot heat map with dendogram together. > > library("gplots") > library("cluster") > > > # Arbitrarily assigning NA to some elements > mtcars[2,2] <- "NA" > mtcars[6,7] <- "NA" > > mydata <- mtcars > > hclustfunc <- function(x) hclust(x, method="complete") > > # Initially I wanted to use this but it didn't take NA > #distfunc <- function(x) dist(x,method="euclidean") > > # Try using daisy GOWER function > # which suppose to work with NA value > distfunc <- function(x) daisy(x,metric="gower") > > d <- distfunc(mydata) > fit <- hclustfunc(d) > > # Perform clustering heatmap > heatmap.2(as.matrix(mydata),dendrogram="row",trace="none", > margin=c(8,9), hclust=hclustfunc,distfun=distfunc); > __END__ > > The error message I got is this: > > Error in which(is.na) : argument to 'which' is not logical > Calls: distfunc.g -> daisy > In addition: Warning messages: > 1: In data.matrix(x) : NAs introduced by coercion > 2: In data.matrix(x) : NAs introduced by coercion > 3: In daisy(x, metric = "gower") : > binary variable(s) 8, 9 treated as interval scaled > Execution halted > > > At the end of the day, I'd like to perform hierarchical clustering > with the NA allowed data. > > G.V. >-- Sarah Goslee http://www.functionaldiversity.org