thr3ads.net - R help - [R] Pre-allocation of matrices is LESS efficient? [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Alex F. Bokov

2011-Feb-17 16:02 UTC

[R] Pre-allocation of matrices is LESS efficient?

Motivation: during each iteration, my code needs to collect tabular data (and
use it only during that iteration), but the rows of data may vary. I thought I
would speed it up by preinitializing the matrix that collects the data with
zeros to what I know to be the maximum number of rows. I was surprised by what I
found...

# set up (not the puzzling part)
x<-matrix(runif(20),nrow=4); y<-matrix(0,nrow=12,ncol=5); foo<-c();

# this is what surprises me... what the?> system.time(for(i in 1:100000){n<-sample(1:4,1);y[1:n,]<-x[1:n,];});   user  system elapsed 
  1.510   0.000   1.514 > system.time(for(i in 1:100000){n<-sample(1:4,1);foo<-x[1:n,];});   user  system elapsed 
  1.090   0.000   1.085

These results are very repeatable. So, if I'm interpreting them correctly,
dynamically allocating 'foo' each time to whatever the current output
size is runs faster than writing to a subset of a preallocated 'y'? How
is that possible?

And, more generally, I'm sure other people have encountered this type of
situation. Am I reinventing the wheel? Is there a best practice for storing
temporary loop-specific data?

Thanks.

PS:  By the way, though I cannot write to foo[,] because the size is different
each time, I tried writing to foo[] and the runtime was worse than either of the
above examples.

Douglas Bates

2011-Feb-17 18:25 UTC

head link

[R] Pre-allocation of matrices is LESS efficient?

On Thu, Feb 17, 2011 at 10:02 AM, Alex F. Bokov
<ahupxot02 at sneakemail.com> wrote:> Motivation: during each iteration, my code needs to collect tabular data
(and use it only during that iteration), but the rows of data may vary. I
thought I would speed it up by preinitializing the matrix that collects the data
with zeros to what I know to be the maximum number of rows. I was surprised by
what I found...
>
> # set up (not the puzzling part)
> x<-matrix(runif(20),nrow=4); y<-matrix(0,nrow=12,ncol=5);
foo<-c();
There is no purpose in initializing foo here.  Your assignment in the
second version overwrites any assignment here.
> # this is what surprises me... what the?
>> system.time(for(i in
1:100000){n<-sample(1:4,1);y[1:n,]<-x[1:n,];});
> ? user ?system elapsed
> ?1.510 ? 0.000 ? 1.514
This version performs extraction from x and assignment into a
submatrix of y.  The second version performs only the extraction and
assignment to a name in the evaluation environment, which is a much
faster operation.
>> system.time(for(i in 1:100000){n<-sample(1:4,1);foo<-x[1:n,];});
> ? user ?system elapsed
> ?1.090 ? 0.000 ? 1.085
>
> These results are very repeatable. So, if I'm interpreting them
correctly, dynamically allocating 'foo' each time to whatever the
current output size is runs faster than writing to a subset of a preallocated
'y'? How is that possible?
>
> And, more generally, I'm sure other people have encountered this type
of situation. Am I reinventing the wheel? Is there a best practice for storing
temporary loop-specific data?
>
> Thanks.
>
> PS: ?By the way, though I cannot write to foo[,] because the size is
different each time, I tried writing to foo[] and the runtime was worse than
either of the above examples.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Duncan Murdoch

2011-Feb-17 18:33 UTC

head link

[R] Pre-allocation of matrices is LESS efficient?

On 17/02/2011 11:02 AM, Alex F. Bokov wrote:> Motivation: during each iteration, my code needs to collect tabular data
(and use it only during that iteration), but the rows of data may vary. I
thought I would speed it up by preinitializing the matrix that collects the data
with zeros to what I know to be the maximum number of rows. I was surprised by
what I found...
>
> # set up (not the puzzling part)
> x<-matrix(runif(20),nrow=4); y<-matrix(0,nrow=12,ncol=5);
foo<-c();
>
> # this is what surprises me... what the?
> >  system.time(for(i in
1:100000){n<-sample(1:4,1);y[1:n,]<-x[1:n,];});
>     user  system elapsed
>    1.510   0.000   1.514
> >  system.time(for(i in
1:100000){n<-sample(1:4,1);foo<-x[1:n,];});
>     user  system elapsed
>    1.090   0.000   1.085
>
> These results are very repeatable. So, if I'm interpreting them
correctly, dynamically allocating 'foo' each time to whatever the
current output size is runs faster than writing to a subset of a preallocated
'y'? How is that possible?
The expression

y[1:n,]<-x[1:n,]


creates a new temporary variable to hold the result of the expression 
x[1:n,], then copies the elements of it to y[1:n,].

The expression

foo <- x[1:n,]

creates the same temporary, and then binds foo to it without doing any 
copying.  Much less work.
> And, more generally, I'm sure other people have encountered this type
of situation. Am I reinventing the wheel? Is there a best practice for storing
temporary loop-specific data?
Storing the value of an expression in a new variable will always be 
faster than copying it into part of an existing variable.

Duncan Murdoch

P.S.  You might be aware of this, but there's one other thing that might 
be a surprise to you:  x[1:1,] will be a vector, while x[1:n,] will be
a matrix for n>1.  Use the "drop=FALSE" argument if you always want
a
matrix result.
> Thanks.
>
> PS:  By the way, though I cannot write to foo[,] because the size is
different each time, I tried writing to foo[] and the runtime was worse than
either of the above examples.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Feb 2011 - Pre-allocation of matrices is LESS efficient?

[R] Pre-allocation of matrices is LESS efficient?

[R] Pre-allocation of matrices is LESS efficient?

[R] Pre-allocation of matrices is LESS efficient?

Seemingly Similar Threads