Scott Davis
2014-Jun-21 20:49 UTC
[R] Error creating daisy matrix in R cluster package - Cannot allocate vector of size 66.0 Gb
My purpose involves creating a dissimilarity matrix using the daisy package in R before applying k-mediod clustering for customer segmentation. The dataset has 133,153 observations of 35 variables in a data.frame with numerical, categorical, blank cells and missing values. Missing values refer to NA, while a blank cells means nothing present within the data.frame. Here’s my OS:> sessionInfo()R version 3.1.0 (2014-04-10) Platform x86_64-w64-mingw32/x64 (64-bit) I have 35 variables, but here is description of the first 5:> head(df)user_id Age Gender Household.Income Marital.Status 1 12945 Male 2 12947 Male 3 12990 4 13160 25-34 Male 100k-125k Single 5 13195 Male 75k-100k Single 6 13286 Since the Windows computer has 3 Gb RAM, I increased the virtual memory to 100Gb hoping that would be enough to create the matrix - it didn't work. I've looked into other R packages for solving the memory problem, but they don't work. I cannot use the `bigmemory` with the `biganalytics` package because it only accepts numeric matrices. The `clara` and `ff` packages also accept only numeric matrices. Here's the daisy script: #Load csv file> Store1 <- read.csv("/Users/name/Client1.csv", head = TRUE)#Convert csv to data.frame> df <-as.data.frame(Store1)#Increase memory allocation in R to 70 GB using the command:> memory.limit(size = 70000)[1] 70000 #Load cluster package> library(cluster)#Create daisy dissimilarity matrix #Use Gower distance coefficient for mixed variables #Set type as ratio scaled variable> daisy1 <- daisy(df, metric = "gower”,type = list(ordratio = c(1:35))) #Error: cannot allocate vector of size 66.0 Gb How can I fix the error? -- Scott Davis Cell: (408)826-9561 Skype ID: Scdavis61 San Jose, CA. [[alternative HTML version deleted]]