I have read through this and been unable to find a *bug* report in it,
Please read the section on BUGS in the R FAQ and explain where the bug here
is -- as far as I can see it is that a user has managed to write code that
exceeds the memory capacity of his computer.
On Tue, 16 Dec 2003 znmeb@aracnet.com wrote:
> Full_Name: Ed Borasky
> Version: 1.8.1
> OS: Windows XP Professional
> Submission from: (NULL) (208.252.96.195)
>
>
> R 1.8.1 seems to be running into a memory allocation problem in the
"aggregate"
> function. I have a rather large dataset (14 columns by 223,000 rows --
almost 40
> megabytes) and a script that performs some processing on it. The system is
a 768
> MB Pentium 4. Here's the console log:
>
>
---------------------------------------------------------------------------------
> R : Copyright 2003, The R Foundation for Statistical Computing
> Version 1.8.1 (2003-11-21), ISBN 3-900051-00-3
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for a HTML browser interface to help.
> Type 'q()' to quit R.
>
> [Previously saved workspace restored]
>
> > source("script.R",echo=TRUE)
>
> > rm(list = ls())
>
> > cvar <- function(x) sd(x)/mean(x)
>
> > library(sm)
> Library `sm', version 2; Copyright (C) 1997, 2000 A.W.Bowman &
A.Azzalini
> type help(sm) for summary information
>
> > callsperhour <- function(x) length(x)/12
>
> > profiles <- subset(read.csv("profiles.csv"), hourofday
>> 7 & hourofday <= 19 & dayofweek >= 1 & dayofweek
<= 5)
>
> > nrow(profiles)
> [1] 100520
>
> > attach(profiles)
>
> > pseudo.hist <- aggregate(duration, list(Delta), length)
>
> > colnames(pseudo.hist) <- c("Delta", "N")
>
> > detach(profiles)
>
> > gc()
> used (Mb) gc trigger (Mb)
> Ncells 701188 18.8 2683553 71.7
> Vcells 1447712 11.1 8201413 62.6
>
> > memory.profile()
> NILSXP SYMSXP LISTSXP CLOSXP ENVSXP PROMSXP
LANGSXP
> 1 7228 244243 3949 495 773
113819
> SPECIALSXP BUILTINSXP CHARSXP LGLSXP
INTSXP
> 207 1177 283663 4661 0 0
49
> REALSXP CPLXSXP STRSXP DOTSXP ANYSXP VECSXP
EXPRSXP
> 13383 9 24870 0 0 2598
2
> BCODESXP EXTPTRSXP WEAKREFSXP
> 0 93 0
>
> > memory.size(max = TRUE)
> [1] 224669696
>
> > memory.size(max = FALSE)
> [1] 81072656
>
> > attach(pseudo.hist)
>
> > pseudo.hist <- pseudo.hist[order(as.numeric(as.character(Delta))),
> ]
>
> > write.table(pseudo.hist, file = "pseudo-hist.csv",
> sep = ",", row.names = FALSE)
>
> > detach(pseudo.hist)
>
> > gc()
> used (Mb) gc trigger (Mb)
> Ncells 701228 18.8 2146842 57.4
> Vcells 1447740 11.1 5248904 40.1
>
> > memory.profile()
> NILSXP SYMSXP LISTSXP CLOSXP ENVSXP PROMSXP
LANGSXP
> 1 7237 244261 3949 495 773
113819
> SPECIALSXP BUILTINSXP CHARSXP LGLSXP
INTSXP
> 207 1177 283672 4661 0 0
49
> REALSXP CPLXSXP STRSXP DOTSXP ANYSXP VECSXP
EXPRSXP
> 13383 9 24870 0 0 2598
2
> BCODESXP EXTPTRSXP WEAKREFSXP
> 0 93 0
>
> > memory.size(max = TRUE)
> [1] 224669696
>
> > memory.size(max = FALSE)
> [1] 81072656
>
> > attach(profiles)
>
> > cphs.site <- aggregate(Timestamp, list(CNumber, localdate),
> callsperhour)
>
> > colnames(cphs.site) <- c("CNumber",
"localdate", "CallsPerHour")
>
> > detach(profiles)
>
> > gc()
> used (Mb) gc trigger (Mb)
> Ncells 701695 18.8 2146842 57.4
> Vcells 1449346 11.1 5248904 40.1
>
> > memory.profile()
> NILSXP SYMSXP LISTSXP CLOSXP ENVSXP PROMSXP
LANGSXP
> 1 7240 244277 3949 495 773
113819
> SPECIALSXP BUILTINSXP CHARSXP LGLSXP
INTSXP
> 207 1177 284109 4661 0 0
51
> REALSXP CPLXSXP STRSXP DOTSXP ANYSXP VECSXP
EXPRSXP
> 13384 9 24877 0 0 2599
2
> BCODESXP EXTPTRSXP WEAKREFSXP
> 0 93 0
>
> > memory.size(max = TRUE)
> [1] 224669696
>
> > memory.size(max = FALSE)
> [1] 82444104
>
> > attach(cphs.site)
>
> > cphs.site <- cphs.site[order(CNumber, localdate),
> ]
>
> > write.table(cphs.site, file = "cphs-site.csv", sep =
",",
> row.names = FALSE)
>
> > detach(cphs.site)
>
> > gc()
> used (Mb) gc trigger (Mb)
> Ncells 701701 18.8 2146842 57.4
> Vcells 1449350 11.1 5248904 40.1
>
> > memory.profile()
> NILSXP SYMSXP LISTSXP CLOSXP ENVSXP PROMSXP
LANGSXP
> 1 7242 244279 3949 495 773
113819
> SPECIALSXP BUILTINSXP CHARSXP LGLSXP
INTSXP
> 207 1177 284111 4661 0 0
51
> REALSXP CPLXSXP STRSXP DOTSXP ANYSXP VECSXP
EXPRSXP
> 13384 9 24877 0 0 2599
2
> BCODESXP EXTPTRSXP WEAKREFSXP
> 0 93 0
>
> > memory.size(max = TRUE)
> [1] 224669696
>
> > memory.size(max = FALSE)
> [1] 82444104
>
> > attach(profiles)
>
> > cphs <- aggregate(Timestamp, list(CNumber, IP, localdate),
> callsperhour)
> Error in makeRestartList(...) : evaluation is nested too deeply: infinite
> recursion?
> >
>
------------------------------------------------------------------------------------
> "profiles.csv" is the 40 MB file. Here's the R code that
generates the error:
>
------------------------------------------------------------------------------------
> # keep a log file
> #sink ("script.log")
>
> # clean house
> rm (list=ls())
>
> # definitions, libraries
> cvar<-function(x) sd(x)/mean(x); # coefficient of variation
> library(sm)
> callsperhour<-function(x) length(x)/12
>
> # load data
> profiles<-subset(read.csv("profiles.csv"),
> #as.character(localdate)<"2003-07-19"
> #&hourofday>=7
> hourofday>=7
> &hourofday<=19
> &dayofweek>=1
> &dayofweek<=5)
>
#profiles<-subset(profiles,!(localdate=="2003-07-11"&CNumber=="C132185"))
> nrow(profiles)
>
> # compute pseudo-histogram
> attach(profiles)
> pseudo.hist<-aggregate(duration,list(Delta),length)
> colnames(pseudo.hist)<-c("Delta", "N")
> detach(profiles)
> gc()
> memory.profile()
> memory.size(max=TRUE)
> memory.size(max=FALSE)
>
> attach (pseudo.hist)
> pseudo.hist<-pseudo.hist[order(as.numeric(as.character(Delta))),]
> #print (pseudo.hist)
> write.table (pseudo.hist, file="pseudo-hist.csv",
sep=",",
> row.names=FALSE)
> detach(pseudo.hist)
> gc()
> memory.profile()
> memory.size(max=TRUE)
> memory.size(max=FALSE)
>
> # compute calls per hour for each site/date combo
> attach(profiles)
> cphs.site<-aggregate(Timestamp,list(CNumber,localdate),callsperhour)
>
colnames(cphs.site)<-c("CNumber","localdate","CallsPerHour")
> detach(profiles)
> gc()
> memory.profile()
> memory.size(max=TRUE)
> memory.size(max=FALSE)
>
> attach(cphs.site)
> cphs.site<-cphs.site[order(CNumber,localdate),]
> #print (cphs.site)
> write.table (cphs.site, file="cphs-site.csv", sep=",",
> row.names=FALSE)
> detach(cphs.site)
> gc()
> memory.profile()
> memory.size(max=TRUE)
> memory.size(max=FALSE)
>
> # compute calls per hour for each site/IP/date combo
> attach(profiles)
> cphs<-aggregate(Timestamp,list(CNumber,IP,localdate),callsperhour)
>
colnames(cphs)<-c("CNumber","IP","localdate","CallsPerHour")
> detach(profiles)
> gc()
> memory.profile()
> memory.size(max=TRUE)
> memory.size(max=FALSE)
>
------------------------------------------------------------------------------------
> ... that's as far as it gets; it croaks in the "aggregate".
Before I put all the
> "gc()" and other diagnostics in, it was croaking with a different
error --
> cannot allocate a 15 MB vector.
>
> If you want, I'll zip up the datafile and see how big it is. I'm
assuming this
> is something simple that I did wrong, though. I'm going to try dropping
the
> extraneous columns before doing the "aggregate"; that might get
the object sizes
> down significantly.
>
> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595