Kevin Parent
2017-Aug-23 05:59 UTC
[R] Data too big for a specific library package to handle
I know there are ways around the 'can't allocate a vector of size x
GB' errors, but I'm stumped.?
So my raw data has >7 million rows and eight columns. That's not a
problem itself.
Using the confreq package (for configural frequency analysis), I take my data
and run it through the package's dat2fre function. This converts to a class
called 'Pfreq.' ?(Looks like a data frame to me, but R recognizes it as
different.) It is now smaller, a little less than a million rows, and one column
added. It's one row for every possible permutation, with the new column a
frequency count, though in my case, 99% are 0s.
However, this data is meaningless by itself and I need to run it through the
packages' CFA command for the main analysis, but when I do, I invariably get
the 'can't allocate' error. The CFA command only works with the
Pfreq class as input.
I usually run 64-bit R under Linux but get the error. So I used a Windows
machine at work (forget which version of Windows, but it runs 64-bit R), but I
still get the error.
The problem with most memory allocation workarounds is that what I'm doing
creates a non-standard, library-specific data structure. Most workarounds are
designed for very large vectors, data frames, lists, matrices, etc., not for
very large 'Pfreqs'.
Any help?
The script below will simulate my data set with random data, but it takes
several minutes to run and may eat up your resources until it's finished.
rm(list=ls(all=T))require(confreq)set.seed(1066)
observations<-
as.factor(rep(replicate(60000,paste(sample(c(LETTERS,letters),sample(15)),collapse=''),simplify=vector),times=100))source<-as.factor(c(rep('A',times=3000000),rep('B',times=3000000)))
#(observations come from one of two sources)
factor.1<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.2<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.3<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.4<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.5<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.6<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.7<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))factor.8<-as.factor(replicate(6000000,sample(c(TRUE,FALSE),1)))
x<-data.frame(observations,source,factor.1, factor.2, factor.3, factor.4,
factor.5, factor.6, factor.7, factor.8)
x<-dat2fre(x)
analysis<-CFA(x) #error: cannot allocate vector of size 2.1 Gb (the error
message for the real data indicates 56 Gb)
?_____ Kevin Parent, Ph.D Korea Maritime University Vice Chairman of Education
and Training, Korea Toastmasters http://grou.ps/koreatoastmasters Schoolmasters,
http://grou.ps/schoolmasters/home
[[alternative HTML version deleted]]
