Scott Davis
2014-Jul-08 23:21 UTC
Error evaluating partitioning around medoids clustering method R clValid package
I have a data.frame with 300 observations of 36 numerical, categorical, and NA variables. I am trying to evaluate the partitioning around medoids clustering algorithm for a marketing segmentation study. My original dataset has over 130,000 observations, but I took a sample for easy reproducibility reasons. My machine Mac OSX 10.9.3: > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) Problem: Getting an error when doing internal and stability evaluation with the clValid CRAN package in R. Code: #Convert csv to data.frame frame <-as.data.frame(Smallstore1) > library(cluster) #Create dissimilarity matrix #Gower coefficient for finding distance between mixed variables > daisy1 <- daisy(frame, metric = "gower", type = list(ordratio c(1:36))) #k-medoid algorithm with 3 clusters > kanswers <- pam(daisy1, 3, diss = TRUE) #Evaluate k-mediod clustering algorithm with 2 to 6 clusters #Import clValid package > library(clValid) #Internal validation > internval1 <- clValid(daisy1, 2:6, clMethods = "pam", validation "internal") #Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat <-Biobase::exprs(obj), : EXPR must be a length 1 vector #Error in summary(internval1) : #error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'internval1' not found #External validation > stabval1 <- clValid(daisy1, 2:6, clMethods = "pam", validation "stability") #Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat <- Biobase::exprs(obj), : EXPR must be a length 1 vector Data: I put the data.frame in a dissimilarity matrix using the daisy function and used partitioning around medoids with 3 clusters. The daisy and pam functions come from the cluster CRAN package in R. Since the data.frame has mixed values, the gower distance coefficient is used. Here's the head of the first 7 variables, but I took out the names of the email for privacy reasons. > head(frame) user_id email Age Gender Household.Income Marital.Status Presence .of.children 1 12945 @bellycard.com <NA> Male <NA> <NA> <NA> 2 12947 @bellycard.com <NA> Male <NA> <NA> <NA> 3 12990 @gmail.com <NA> <NA> <NA> <NA> <NA> 4 13160 @gmail.com 25-34 Male 100k-125k Single No 5 13195 @gmail.com <NA> Male 75k-100k Single No 6 13286 @gmail.com <NA> <NA> <NA> <NA> <NA> Please let me know if I can provide more information. -- Scott Davis Cell: (408)826-9561 Skype ID: Scdavis61 San Jose, CA. [[alternative HTML version deleted]]