Hi R-users: Yesterday I ran a R code for 9 hours and it did not show any sign to stop. Then I interrupted it and found it had completed 82.5%. This morning I decided to wait for another 11 hours to see what will happen. Wait a minute, I heard that transforming data.frame to matrix will make R code faster. Then I made the modification in my R code. Oooh, the new code finished within 30 minutes!! Are there any other tips to speed up R program? Or someone could indicate me some documents or websites on R code optimization? #OS: Win XP, CPU: Pentium IV, 3.20G, Memory: 1G #for() loop: 1000*1616*3*41, 3 data.frames (dim = c(1616,5), c(1616), c(1616) respectively) Thanks in advance, Xiaohua -- Xiaohua Dai, Dr. -------------------------------------------------------------------------------- * Postdoctoral in elephant-tree ecosystem simulation --------------------------------------------------------------------------------- Centre for Systems Research, Durban Institute of Technology P.O.Box 953, Durban 4000, South Africa Tel: +27-31-2042737(O) Fax: +27-31-2042736(O) Mobile: +27-723682954 Publications: http://www.getcited.org/?MBR=11061629
ecoinfo wrote:> Hi R-users: > > Yesterday I ran a R code for 9 hours and it did not show any sign to > stop. Then I interrupted it and found it had completed 82.5%. > > This morning I decided to wait for another 11 hours to see what will > happen. Wait a minute, I heard that transforming data.frame to matrix > will make R code faster. Then I made the modification in my R code. > Oooh, the new code finished within 30 minutes!! > > Are there any other tips to speed up R program? Or someone could > indicate me some documents or websites on R code optimization? > > #OS: Win XP, CPU: Pentium IV, 3.20G, Memory: 1G > #for() loop: 1000*1616*3*41, 3 data.frames (dim = c(1616,5), c(1616), > c(1616) respectively)- As you found, indexing operations on matrices are much faster than on dataframes. - Avoid growing allocations: calculate the size you need, then allocate it all at once. - Vectorize calculations. - Use Rprof() to identify where your code is spending its time, and concentrate your efforts on that area. Perhaps translate some essential routines into compiled C or Fortran. - For a smaller improvement that might not suit your application, convert factors to their numeric codes. - Break up long calculations into smaller pieces, so you can write out intermediate values. This doesn't necessarily speed it up, but it lets you stop and restart the calculation. It may also make it more suited to running on a cluster of computers instead of just one. - Limit your use of memory so you don't end up using a swap file. Do this by only keeping objects that will be used later, removing others. (With the size of objects you were working with this may not be an issue.) Duncan Murdoch
Hello, 2005/10/18, ecoinfo <ecoinformatics at gmail.com>:> Hi R-users: > > Yesterday I ran a R code for 9 hours and it did not show any sign to > stop. Then I interrupted it and found it had completed 82.5%. > > This morning I decided to wait for another 11 hours to see what will > happen. Wait a minute, I heard that transforming data.frame to matrix > will make R code faster. Then I made the modification in my R code. > Oooh, the new code finished within 30 minutes!! > > Are there any other tips to speed up R program? Or someone could > indicate me some documents or websites on R code optimization? > > #OS: Win XP, CPU: Pentium IV, 3.20G, Memory: 1G > #for() loop: 1000*1616*3*41, 3 data.frames (dim = c(1616,5), c(1616), > c(1616) respectively) >RSiteSearch("speed up R code") gives 346 hits, so this problem has been discussed on this list some time before. Maybe something worth to pay attention to? regards Thomas
Hi, I have written a piece of code, which is a variant of the random forest (rf) package algorithm, entirely in R. I know that some of the code in the rf package is written in c or c++. The problem is that the execution of my code in R takes a lot of time. To give you an example, the building and testing of data set with 20,000 instances using the random forest function from the rf package takes a few minutes while 'my' random forest's execution time is around 5 hours. So, I wonder if there are some ways to speed up the execution time. I've read in a similar post that using matrix instead of data.frame would actually speed up the R code. The format of my read-in data set is a "list", would the data set in matrix format (using as.matrix) be better? Thanks in advance, Martin
Hi, I should read R-help archives more carefully. According to your link, I searched in Google and found another pdf: http://www.demog.berkeley.edu/~boe/Rstuff/R-fundamentalsLumleyBates/R-fundamentalsLumleyBates.pdf Thanks for your advice, Xiaohua On 10/18/05, Thomas Sch??nhoff <tschoenhoff at gmail.com> wrote:> Hi, > > well exactly the wording as suggest in my post gives me the same help > as done by Duncan! It's actually the tip Douglas Bates gave to someone > else, using Rprofile for that issue.... > > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/58935.html > > 2005/10/18, ecoinfo <ecoinformatics at gmail.com>: > > RSiteSearch("speed up R code") > > == search for a page having the words (speed, up, R, and code) > > surely R is found everywhere. > > Although there are some useful archives, many of them are not. > > Furthermore, I need a general instruction instead of pieces (e.g. > > Patrick's book and Duncan's rules) > > > > If I use "speed up R code" as a phrase, then only one not-very-useful hit. > >