Scott Davis
2014-Jul-08 23:21 UTC
Error evaluating partitioning around medoids clustering method R clValid package
I have a data.frame with 300 observations of 36 numerical, categorical, and
NA variables. I am trying to evaluate the partitioning around medoids
clustering algorithm for a marketing segmentation study. My original
dataset has over 130,000 observations, but I took a sample for easy
reproducibility reasons.
My machine Mac OSX 10.9.3:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
Problem: Getting an error when doing internal and stability evaluation with
the clValid CRAN package in R.
Code:
#Convert csv to data.frame
frame <-as.data.frame(Smallstore1)
> library(cluster)
#Create dissimilarity matrix
#Gower coefficient for finding distance between mixed variables
> daisy1 <- daisy(frame, metric = "gower", type =
list(ordratio c(1:36)))
#k-medoid algorithm with 3 clusters
> kanswers <- pam(daisy1, 3, diss = TRUE)
#Evaluate k-mediod clustering algorithm with 2 to 6 clusters
#Import clValid package
> library(clValid)
#Internal validation
> internval1 <- clValid(daisy1, 2:6, clMethods = "pam",
validation "internal")
#Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat
<-Biobase::exprs(obj), : EXPR must be a length 1 vector
#Error in summary(internval1) :
#error in evaluating the argument 'object' in selecting a method
for
function 'summary': Error: object 'internval1' not found
#External validation
> stabval1 <- clValid(daisy1, 2:6, clMethods = "pam",
validation "stability")
#Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat
<- Biobase::exprs(obj), : EXPR must be a length 1 vector
Data:
I put the data.frame in a dissimilarity matrix using the daisy function and
used partitioning around medoids with 3 clusters. The daisy and pam
functions come from the cluster CRAN package in R. Since the data.frame has
mixed values, the gower distance coefficient is used. Here's the head of
the first 7 variables, but I took out the names of the email for privacy
reasons.
> head(frame)
user_id email Age Gender Household.Income
Marital.Status Presence .of.children
1 12945 @bellycard.com <NA> Male <NA>
<NA>
<NA>
2 12947 @bellycard.com <NA> Male <NA>
<NA>
<NA>
3 12990 @gmail.com <NA> <NA> <NA>
<NA>
<NA>
4 13160 @gmail.com 25-34 Male 100k-125k Single
No
5 13195 @gmail.com <NA> Male 75k-100k Single
No
6 13286 @gmail.com <NA> <NA> <NA>
<NA>
<NA>
Please let me know if I can provide more information.
--
Scott Davis
Cell: (408)826-9561
Skype ID: Scdavis61
San Jose, CA.
[[alternative HTML version deleted]]