Saptarhi,
Here are some observations. It seems to me that your question is about
assignment into long lists.
1) Initialize your list with a vector of NULLs.
2) If you can, try to use vectors rather then lists. It's faster by more
than 30%.
3) If you get rid of the i <- i+1, you go even faster. See below.
N <- 1e7
h=vector("list", length=N)
system.time({
i <- 1
while(i<N){
h[[i]] <- i
i <- i+1
}
})
# for N=1e6
user system elapsed
5.02 0.01 5.03
# for N=1e7
user system elapsed
83.64 0.30 84.03
h=vector("numeric", length=N)
system.time({
i <- 1
while(i<N){
h[[i]] <- i
i <- i+1
}
})
# for N=1e6
user system elapsed
3.39 0.00 3.39
# for N=1e7
user system elapsed
34.28 0.04 34.40
h=vector("numeric", length=N)
system.time({
for (i in seq_len(N))
h[i] <- i
})
# for N=1e7
user system elapsed
20.30 0.04 20.38
Best,
Adrian
From: Saptarshi Guha <saptarshi.guha_at_gmail.com>
Date: Mon, 30 Mar 2009 11:24:55 -0400
Hello R users
I have question about the time involved in list assignment. Consider the
following code snippet(see below). The first line creates a reader object,
which is the interface to 1MM key-value pairs (serialized R objects)
spanning 50 files (a total of 50MB). rhsqstart initiates the reading and I
loop, reading each key-value pair using rhsqnextKVR. If this returns NULL,
we switch to the next file and if this returns null we break.
If I comment out line A1, it takes 39 seconds on a quad core intel with
16GB ram running R-2.8
If I include the assignment A1 it takes ~85 seconds.
I have preassigned the list in line A0, so I'm guessing there is no
resizing going on, so why does the time increase so much?
Thank you for your time.
Regards
Saptarshi
==code=rdr <- rhsqreader("~/tmp/pp",local=T,pattern="^p")
rdr <- rhsqstart(rdr)
i <- 1;
h=as.list(rep(1,1e6)) ##A0
while(TRUE){
value <-rhsqnextKVR(rdr) ##Returns a list of two elements K,V
if(is.null(value)) {
message(rdr$df[rdr$current])
rdr <- rhsqnextpath(rdr)
if(is.null(rdr)) break;
}
h[[i]] <- value; ##A1
i <- i+1
}