> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of S. Few
> Sent: Thursday, September 10, 2009 1:46 PM
> To: r-help at r-project.org
> Subject: [R] R 2.9.2 memory max - object vector size
>
> Me:
>
> Win XP
> 4 gig ram
> R 2.9.2
>
> library(foreign) # to read/write SPSS files
> library(doBy) # for summaryBy
> library(RODBC)
> setwd("C:\\Documents and Settings\\............00909BR")
> gc()
> memory.limit(size=4000)
>
> ## PROBLEM:
>
> I have memory limit problems. R and otherwise. My dataframes for
> merging or subsetting are about 300k to 900k records.
> I've had errors such as vector size too large. gc() was done.....reset
> workspace, etc.
>
> This fails:
>
> y$pickseq<-with(y,ave(as.numeric(as.Date(timestamp)),id,FUN=seq))
If any values in id are singletons then the call to
seq(timestamp[id=="singleton"])
returns a vector whose length is timestamp[id=="singleton"] (not the
length
of that, the value of that). as.numeric(as.Date("2009-09-10")) is
14497
so you
might have a lot of 14497-long vectors being created (and thrown away,
unused
except for their initial value). Using seq_along instead of seq would
take
care of that potential problem. E.g.,
>
d1<-data.frame(x=c(2,3,5e9,4,5),id=c("A","B","B","B","A"))
>
d2<-data.frame(x=c(2,3,5e9,4,5),id=c("A","B","C","B","A"))
> # d1$id has no singletons, d2$id does where d2$x is huge
> with(d1, ave(x,id,FUN=seq))
[1] 1 1 2 3 2
> with(d2, ave(x,id,FUN=seq))
Error in 1L:from : result would be too long a vector
> with(d2, ave(x,id,FUN=seq_along))
[1] 1 1 1 2 2
If your intent is to create a vector of within-group sequence numbers
then there are more efficient ways to do it. E.g., with the following
functions
withinGroupSeq <- function(x){
x <- as.factor(x)
retval <- integer(length(x))
retval[order(as.integer(x))] <- Sequence(table(x))
retval
}
# Sequence is like base::sequence but should use less memory
# by avoiding the list that sequence's lapply call makes.
Sequence <- function(nvec) {
seq_len(sum(nvec)) - rep(cumsum(c(0L,nvec[-length(nvec)])), nvec)
}
you can get the same result as ave(FUN=seq_along) in less time and,
I suspect, less memory
> withinGroupSeq(d1$id)
[1] 1 1 2 3 2
> withinGroupSeq(d2$id)
[1] 1 1 1 2 2
Base R may have a function for that already.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
>
> Any clues?
>
> Is this 2.9.2?
>
> Skipping forward, should I download version R 2.8 or less?
>
> Thanks!
> Steve
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>