Freja.Vamborg@astrazeneca.com
2004-Aug-06 07:38 UTC
[R] speeding up functions for large datasets
Dear R-helpers, I'm dealing with large datasets, say tables of 60 000 times 12 or so, and some of the functions are (too ) slow and I'm therefore trying to find ways to speed them up. I've found that for instance for-loops are slow in R (both by testing and by searching through mail archives etc ) Are there any more well known arguments that are slow in R, ,maybe at data representation level, code-writing, reading in the data. I've also tried incorporating C-code, which works well, but I'd also like to find other, maybe more "shortcut" ways. Thanks in advance, Freja!
On Fri, 6 Aug 2004 Freja.Vamborg at astrazeneca.com wrote:> Dear R-helpers, > I'm dealing with large datasets, say tables of 60 000 times 12 or so, and > some of the functions are (too ) slow and I'm therefore trying to find ways > to speed them up. > I've found that for instance for-loops are slow in R (both by testing and by > searching through mail archives etc )I don't think that is really true, but it is the case that using row-by-row operations in your situation would be slow *if they are unnecessary*. It is a question of choosing the right algorithmic approach, not whether it is implemented by for-loops or lapply or ....> Are there any more well known arguments that are slow in R, ,maybe at data > representation level, code-writing, reading in the data. > I've also tried incorporating C-code, which works well, but I'd also like to > find other, maybe more "shortcut" ways.`S Programming' (see the R FAQ) has a whole chapter on this sort of thing, with examples. More generally you want to take a `whole object' view and use indexing and other vectorized operations. Note also that what is slow does change with the version of R and especially how much memory you have installed. The first step is to get enough RAM. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
you might want to turn your data into a matrix. You get much much faster for loops doing that. Jean, On Fri, 6 Aug 2004 Freja.Vamborg at astrazeneca.com wrote:> Dear R-helpers, > I'm dealing with large datasets, say tables of 60 000 times 12 or so, and > some of the functions are (too ) slow and I'm therefore trying to find ways > to speed them up. > I've found that for instance for-loops are slow in R (both by testing and by > searching through mail archives etc ) > Are there any more well known arguments that are slow in R, ,maybe at data > representation level, code-writing, reading in the data. > I've also tried incorporating C-code, which works well, but I'd also like to > find other, maybe more "shortcut" ways. > > Thanks in advance, > Freja! >