thr3ads.net - R devel - performance of apply [May 1998]

If this information is useful, please help other people find it:
Share via:

Andreas Weingessel

1998-May-28 13:25 UTC

performance of apply

I noticed that apply is VERY SLOW when applied to a "large"
dimension as for example when computing the row sums of a matrix with
thousands of rows.

To demonstrate it, I did some benchmarking for different methods of
computing the row sums of an nx10 matrix with n =3D 2000, ..., 10000.

The first method (M1) I used is the normal apply command:
	y <- apply(x,1,sum)
The second method (M2) uses a for-loop for the computations, where the
memory for the resulting vector has been allocated before. That is, for
n=3D2000:
	z <- numeric(2000); for (i in 1:2000) z[i] <- sum(x[i,])
The third method (M3) also uses a for-loop, but the resulting vector
is built recursively, i.e.
	z1 <- NULL; for (i in 1:2000) z1 <- c(z1,sum(x[i,]))

All computations have been made on a Pentium II 233MHz, 256MB, R
started as R -v 50. The following table shows the minimum, mean, and
maximum CPU-time in seconds as measured by system.time over 10 runs
for every computation for different values of n.

n	M1			M2		M3
2000	 4.03  4.16   4.34	0.27 0.40 0.47	0.51 0.63 0.71
4000	12.65 13.40  14.68	0.73 0.81 0.94	1.78 1.86 1.98
6000	26.51 28.14  29.50	1.19 1.22 1.38	3.79 3.80 3.80
8000	52.06 63.43  67.61	1.46 1.63 1.69	6.38 6.41 6.58
10000	84.06 98.17 118.94	1.93 2.01 2.13  9.78 9.79 9.81

That is, the computation of the sums of the rows of a 10000x10 matrix
with apply takes about 100sec on average, where a simple for-loop does
the same job in about 2sec.

The next table shows for every method the relative increase of time as
compared to the increase of the number of rows.

n	M1	M2	M3
1	 1	1	 1
2	 3.221	2.025	 2.952
3	 6.764	3.050	 6.032
4	15.247	4.075	10.175
5	23.599	5.025	15.540

You can see that the time for M2 (for-loop with memory allocated at
the beginning) increases linearly with n, which is to be expected,
whereas apply (M1) is scaling quadratically with the number of rows. M3
(for-loop, but new memory-allocation every time) is somewhere in between.


So obviously, R becomes slow, when new memory has to be allocated
repeatedly. I do not see a quick fix for apply, because apply can be
used with arbitrary functions which might yield different results for
different input data. But I think as long as memory allocation seems
to be a performance bottleneck, one should be aware of this problem,
use simple for-loops and use apply only on small data (small
dimensions) or where it is really necessary.

	Andreas

************************************************************************
*                          Andreas Weingessel                          *
************************************************************************
* Institut f=FCr Statistik      *                Tel: (+43 1) 58801 4541 *
* Technische Universit=E4t Wien *                Fax: (+43 1)  504 14 98 *
* Wiedner Hauptstr. 8-10/1071 *     Andreas.Weingessel@ci.tuwien.ac.at *
* A-1040 Wien, Austria        * http://www.ci.tuwien.ac.at/~weingessel *
************************************************************************

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Douglas Bates

1998-May-29 14:05 UTC

head link

performance of apply

Andreas Weingessel <Andreas.Weingessel@ci.tuwien.ac.at> writes:
> I noticed that apply is VERY SLOW when applied to a "large"
> dimension as for example when computing the row sums of a matrix with
> thousands of rows.
> 
> To demonstrate it, I did some benchmarking for different methods of
> computing the row sums of an nx10 matrix with n =3D 2000, ..., 10000.
> 
> The first method (M1) I used is the normal apply command:
> 	y <- apply(x,1,sum)
> The second method (M2) uses a for-loop for the computations, where the
> memory for the resulting vector has been allocated before. That is, for
> n=3D2000:
> 	z <- numeric(2000); for (i in 1:2000) z[i] <- sum(x[i,])
> The third method (M3) also uses a for-loop, but the resulting vector
> is built recursively, i.e.
> 	z1 <- NULL; for (i in 1:2000) z1 <- c(z1,sum(x[i,]))
> 
> All computations have been made on a Pentium II 233MHz, 256MB, R
> started as R -v 50. The following table shows the minimum, mean, and
> maximum CPU-time in seconds as measured by system.time over 10 runs
> for every computation for different values of n.
> 
> n	M1			M2		M3
> 2000	 4.03  4.16   4.34	0.27 0.40 0.47	0.51 0.63 0.71
> 4000	12.65 13.40  14.68	0.73 0.81 0.94	1.78 1.86 1.98
> 6000	26.51 28.14  29.50	1.19 1.22 1.38	3.79 3.80 3.80
> 8000	52.06 63.43  67.61	1.46 1.63 1.69	6.38 6.41 6.58
> 10000	84.06 98.17 118.94	1.93 2.01 2.13  9.78 9.79 9.81
> 
> That is, the computation of the sums of the rows of a 10000x10 matrix
> with apply takes about 100sec on average, where a simple for-loop does
> the same job in about 2sec.
There would be at least two other methods that would be interesting to
try.  Because you are interested in the row sums I imagine by far the
fastest method would be (M4)
   y <- x %*% rep(1, ncol(x))
You may also find that (M5)
   y <- apply( t(x), 2, sum )
would be faster than M1 because of the way the R handles arrays.

We should keep in mind when looking at these tables that the maximum
time on the size 10000 case is about 2 minutes.  If the computation is
worth doing it may be worth waiting 2 minutes for the result.  I
remember the days of running S version 2 (i.e. the version before "New
S") on a Vax-11/750.  This sort of computation could take many, many
hours on the only computer in the department so relative differences
in speed for different methods were a lot more important.  One got
used to rephrasing computations in "efficient" ways.  Today I think
that clarity is usually more important than efficiency.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Bill Simpson

1998-May-29 14:32 UTC

head link

performance of apply

I was computing a function f(u,v) on bivariate point process data of
about 4500 points. The computation in R took about 96 hours (4 days!) on a
Pentium Pro 200 under linux with little or no other load. So I think speed
can be an issue, esp. when computing things one will eventually display
with image().

My collaborator and I wound up coding it in C and now the computation
takes 4 min (yes, 4 min vs 4 days).

Probably it could have been sped up in R but that would have taken the
same effort as coding in C, and the result would have been slower.

Bill Simpson

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Andreas Weingessel

1998-May-29 14:52 UTC

head link

performance of apply

Maybe my point was not quite clear. I did not want to compute the row
sums of a certain matrix, I just wanted to demonstrate that the task
"apply a certain function to every row of a matrix" can take a very
long time, if it is done with the function "apply" which is of course
the most clear way to implement this task. I first came across this
problem when I made simulations on a data matrix with 15000 rows and
after adding one simple "apply" to my program, it suddenly seemed not
to
finish at all.

If I just want make such a computation once, it might be not important
whether I wait a couple of seconds or 2 minutes or 5 minutes (as it
needs time for 15000 rows). But if I do some simulation where these
computations are repeated hundred times, there is a difference between
200 seconds or 200 minutes.

	Andreas

************************************************************************
*                          Andreas Weingessel                          *
************************************************************************
* Institut f=FCr Statistik      *                Tel: (+43 1) 58801 4541 *
* Technische Universit=E4t Wien *                Fax: (+43 1)  504 14 98 *
* Wiedner Hauptstr. 8-10/1071 *     Andreas.Weingessel@ci.tuwien.ac.at *
* A-1040 Wien, Austria        * http://www.ci.tuwien.ac.at/~weingessel *
************************************************************************

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Luke Tierney

1998-May-29 15:45 UTC

head link

performance of apply

Douglas Bates wrote:> 
> Andreas Weingessel <Andreas.Weingessel@ci.tuwien.ac.at> writes:
> > 
> > To demonstrate it, I did some benchmarking for different methods of
> > computing the row sums of an nx10 matrix with n =3D 2000, ..., 10000.
> > 
> > The first method (M1) I used is the normal apply command:
> > 	y <- apply(x,1,sum)
> > The second method (M2) uses a for-loop for the computations, where the
> > memory for the resulting vector has been allocated before. That is,
for
> > n=3D2000:
> > 	z <- numeric(2000); for (i in 1:2000) z[i] <- sum(x[i,])
> > The third method (M3) also uses a for-loop, but the resulting vector
> > is built recursively, i.e.
> > 	z1 <- NULL; for (i in 1:2000) z1 <- c(z1,sum(x[i,]))
> > 
> > All computations have been made on a Pentium II 233MHz, 256MB, R
> > started as R -v 50. The following table shows the minimum, mean, and
> > maximum CPU-time in seconds as measured by system.time over 10 runs
> > for every computation for different values of n.
> > 
> > n			M1			M2		M3
> > 2000	 4.03  4.16   4.34	0.27 0.40 0.47	0.51 0.63 0.71
> > 4000	12.65 13.40  14.68	0.73 0.81 0.94	1.78 1.86 1.98
> > 6000	26.51 28.14  29.50	1.19 1.22 1.38	3.79 3.80 3.80
> > 8000	52.06 63.43  67.61	1.46 1.63 1.69	6.38 6.41 6.58
> > 10000	84.06 98.17 118.94	1.93 2.01 2.13  9.78 9.79 9.81
> > 
> > That is, the computation of the sums of the rows of a 10000x10 matrix
> > with apply takes about 100sec on average, where a simple for-loop does
> > the same job in about 2sec.
It looks like M1 and M3 are O(n^2) but M1 is O(n). I see where that
comes from in M3, but it surprises me for M1.

luke



-- 
Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
206 Church Street                            email:      luke@stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Ross Ihaka

1998-May-29 21:20 UTC

head link

performance of apply

Subject: Re: performance of apply
	From: Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk>
	Date: 29 May 1998 18:04:30 +0200
	MIME-Version: 1.0
	Content-Transfer-Encoding: 7bit

	Oh, dear. Looking at apply, we have right in the middle of it:

	            for (i in 1:d2) ans[[i]] <- FUN(array(newX[, i], d.call, 
	                    dn.call), ...)

	Assignment to the tail of a very long list... 

	I think I'm beginning to understand why Ross keeps talking about
	generic vectors. Either that or we need to add some low-level list
	primitives to the language. 


Got it in one.  To reach ans[[i]] you have to cdr down the whole list.
The subsetting code would be vastly simplified without the special
list code.  Ugliest of all is subscript manipulation for lists with
dim attributes.

[ I estimate 2 months for a fix - not that it will take that long,
  but there are some other things higher up the todo list ].

	Ross
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - May 1998 - performance of apply

performance of apply

performance of apply

performance of apply

performance of apply

performance of apply

performance of apply

Possibly Parallel Threads