Brad Thompson wrote:> In an R program I am working on, I progressively build up several
> vectors of integers. I never know how long the vectors will eventually
> be, so I can't preallocate them with vector(). If I preallocate all of
> the vectors to their maximum size, I will run out of memory. I tried
> using c() or append() to build up the vectors, as in the following (tried
> on R 2.1.0 on Linux and R 2.2.1 on MacOS 10):
>
> li <- vector() # also tried list() and pairlist()
> for (i in 0:40000) {
> li <- c(li, i) # also tried with i and li swapped
> if (i %% 10000 == 0)
> system('date')
> }
>
> The problem is that based on the times this outputs, it is O(n^2),
> matching the straightforward implementation of c() where everything
> passed to c() is copied.
>
> I tried extending the vector by assigning to length(li) instead of
> using c(), but that also runs in O(n) (so the loop runs in O(n^2)) and
> appears to copy the elements.
>
> What I am looking for is an array that can dynamically resized in
> amortized constant or log time (like python's list or C++'s
> std::vector). I could build up a data structure inside R (the same way
> std::vector is built on top of C arrays), but I was hoping someone
> might have some advice on a better way to do this.
>
> Does R have a resizeable array type that I'm missing? What do others
> generally do in this case?
>
> Thank you,
>
> Brad
>
Hi, Brad,
You can add length as needed as in:
## create a buffer
li <- vector("numeric", 1000)
t1 <- proc.time()[3]
for (i in 1:999999) {
## add more space if needed
if(i > length(li)) length(li) <- length(li) + 1000
li[i] <- i
if(i %% 10000 == 0) {
t2 <- proc.time()[3]
cat(sprintf("n = %6d; time = %4.2f\n", i, t2 - t1))
## flush.console() ## Windows only, I think
t1 <- t2
}
}
## resize 'li'
length(li) <- i
HTH,
--sundar