R may be 'interpreted', but the base functions are written in
C++/FORTRAN.
If you are doing a lot of matrix operations (selection, computing, etc.) you
are not going to run any faster since you are already using optimized code.
There are routine for profiling R. Do you know where in your code you are
spending most of the time? Here is an example of some output from my script
that reads in some data and then generates about 16 plots. This is done by
putting some 'print' statements in the code to output the amount of CPU,
elapsed time and memory that is being used; I have put some comments to show
what the progress is. From this I know which portion of my program needs to
be optimized:
> source('C:/Perf/bin/Trace CPU by PID (POSIX).r')
Read 720726 records
read - my.stats : < 8.1 8.2 > 33.54 2.43 664.04 : 108.2 MB
# this has read in 720,726 lines of data with 9 columns of
# data and it took 8.1 seconds of CPU time (first number in the <>)
# the CPU time and elapsed (second number in <>) are cumulative from the
start)
# this is mostly the conversion of character data to reals/integers.
# you probably are not going to get faster unless you data is already in
binary.
time conversion - my.stats : < 8.6 8.6 > 34.01 2.43 664.51 : 110.1 MB
# this 0.5 seconds was the conversion of 720,726 character strings to
factors for
# faster processing in later parts of the program
badTimes - my.stats : < 12.4 12.5 > 37.81 2.48 668.39 : 143.5 MB
# this is converting the character string "mm/dd/yy hh:mm:ss" from
character to
# binary. This took an addition 3.8 CPU seconds (12.4 - 8.6). For 720,726
character
# strings, I would say this is pretty fast. Compiled code is probably not
going to
# be a lot faster.
done; make approx functions - my.stats : < 13.5 13.6 > 38.84 2.51 669.45
:
112.2 MB
# another 1.1 CPU seconds to make five passes through 720,726 data point,
# cleaning up some more data
start plots - my.stats : < 16.8 17.1 > 42.1 2.53 672.93 : 116.4 MB
# the time to here was another 3.1 CPU seconds to partition the data by
'command',
# (this is computer performance data) into 159 groups, sum up the CPU time
that the
# command in each group used. Again I would guess compiled code would not
be much faster
plot commands - my.stats : < 17.4 17.8 > 42.79 2.53 673.64 : 117.6 MB
# it took 0.6 seconds to generate a graph that created an 'area' plot of
the
top
# 20 commands and all the rest grouped in 'other'
indiv commands - my.stats : < 21.1 21.5 > 46.4 2.56 677.35 : 100.1 MB
# took another 3.7 CPU seconds to create individual plots for the top 15
commands as
# area graphs broken down by PID.
done Trace - my.stats : < 23.8 24.3 > 49.15 2.57 680.18 : 124.9 MB
# so we used 23.8 CPU seconds in 24.3 seconds of elapsed time
So once you get something like this, you can see where the time is being
spent. Also how many data points are we talking about? I would guess that
a lot of your time may be in the interface with the data base. So can you
provide some details like this to help us understand where you problem is?
For my scripts, I would guess I would not get that much better performance
out of compiled code since I am probably spending most of my time in the
'base' (compiled) functions in R. The time is due to the amount of data
I
have to process. What is the size of the data that you are processing? It
is details like this that may help you reduce the time. You also have a
startup of 1-2 seconds of R itself. Can you just keep a copy of R running
that you send scripts to?
So I will ask you one of my favorite comments that I put to development
organizations when reviewing their architecture: "Tell me what you want to
do, not how to do it".
There may be other alternatives to consider once we all understand the
problem.
On 3/24/06, Leo Espindle <espindle@gmail.com >
wrote:>
> Thanks for the reply.
>
> Basically, it is taking too long to run through a series of R scripts when
> using the D COM interface (StatConnector) to the R environment from a .Net
> application. Right now, it takes about 30 seconds to finish the routine,
> which includes firing up R using StatConnector, performing a series of
> calculations using .R scripts residing on the hard drive, inserting results
> into a SQL Server Database (using sqlSave), and then generating 70+ graphs
> using the default graphics device and saving those graphs to the hard drive
> (also using .R scripts residing on the hard drive).
>
> Admittedly, its running on a somewhat older CPU, and not as a service (so
> the graphs actually appear on screen), but we are looking to cut this time
> down to around under 5 seconds if at all possible.
>
> I have not spent a lot of time optimizing the code, but I suspect the
> problem lies more in the fact that we have to rely on the interpreted R
> environment, and we have to read .R scripts (existing on the hard drive)
> using StatConnector.
>
> Any suggestions as to how to speed up graphics generation and subsequent
> writing of the graphs, for instance, would be helpful, as that seems to
take
> up the bulk of the time.
>
> I have not tried writing C++ programs that call the required R routines,
> unless you are referring to the functionality in StatConnector.
>
> Leo
>
> On 3/24/06, jim holtman <jholtman@gmail.com > wrote:
> >
> > What are your performance issues? What are you performance targets?
> > How far you are off the targets? Have you optimized your R code? How
are
> > you calling/using R? What is your interface? Have you tried to write
C++
> > programs that call the required R routines?
> >
> > Please provide some more information.
> >
> >
> > On 3/24/06, Leo Espindle < espindle@gmail.com> wrote:
> >
> > > I am currently working on a project that involves using R and
> > .Net. We're
> > having performance issues with R, and we're wondering if there is
a way
> > to
> > get around the R interpreter, particularly by compiling R directly for
> > the
> > .Net CLR? We're wondering if there is any initiatives to build
such a
> > compiler.
> >
> > Thanks,
> > Leo
> >
> >
> > [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> >
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390 (Cell)
> > +1 513 247 0281 (Home)
> >
> > What the problem you are trying to solve?
> >
>
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)
What the problem you are trying to solve?
[[alternative HTML version deleted]]