thr3ads.net - R help - [R] Tip for performance improvement while handling huge data? [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Suresh_FSFM

2009-Feb-08 17:39 UTC

[R] Tip for performance improvement while handling huge data?

Hello All,

For certain calculations, I have to handle a dataframe with say 10 million
rows and multiple columns of different datatypes. 
When I try to perform calculations on certain elements in each row, the
program just goes in "busy" mode for really long time.
To avoid this "busy" mode, I split the dataframe into subsets of 10000
rows.
Then the calculation was done very fast. within reasonable time.

Is there any other tip to improve the performance ?

Regards,
Suresh
 
-- 
View this message in context:
http://www.nabble.com/Tip-for-performance-improvement-while-handling-huge-data--tp21901287p21901287.html
Sent from the R help mailing list archive at Nabble.com.

Philipp Pagel

2009-Feb-08 19:28 UTC

head link

[R] Tip for performance improvement while handling huge data?

> For certain calculations, I have to handle a dataframe with say 10 million
> rows and multiple columns of different datatypes. 
> When I try to perform calculations on certain elements in each row, the
> program just goes in "busy" mode for really long time.
> To avoid this "busy" mode, I split the dataframe into subsets of
10000 rows.
> Then the calculation was done very fast. within reasonable time.
> 
> Is there any other tip to improve the performance ?
Depending on what exactly it is you are doing and what causes the slowdown
there may be a number of useful strategies:

 - Buy RAM (lots of it) - it's cheap
 - Vectorize whatever you are doing
 - Don't use all the data you have but draw a random sample of reasonalbe
size
 - ...

To be more helpful we'd have to know

 - what are the computations involved?
 - how are they implemented at the moment?
  -> example code
 - what is the range of "really long time"?

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl f?r Genomorientierte Bioinformatik
Technische Universit?t M?nchen
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Feb 2009 - Tip for performance improvement while handling huge data?

[R] Tip for performance improvement while handling huge data?

[R] Tip for performance improvement while handling huge data?

Possibly Parallel Threads