thr3ads.net - R help - [R] Computations slow in spite of large amounts of RAM. [Jul 2003]

If this information is useful, please help other people find it:
Share via:

Huiqin Yang

2003-Jul-01 13:55 UTC

[R] Computations slow in spite of large amounts of RAM.

Hi all,

I am a beginner trying to use R to work with large amounts of
oceanographic data, and I find that computations can be VERY slow.  In
particular, computational speed seems to depend strongly on the number
and size of the objects that are loaded (when R starts up).  The same
computations are significantly faster when all but the essential
objects are removed.  I am running R on a machine with 16 GB of RAM,
and our unix system manager assures me that there is memory available
to my R process that has not been used.

1.  Is the problem associated with how R uses memory?  If so, is there
some way to increase the amount of memory used by my R process to get
better performance?

The computations that are particularly slow involve looping with
by().  The data are measurements of vertical profiles of pressure,
temperature, and salinity at a number of stations, which are organized
into a dataframe p.1 (1925930 rows, 8 columns: id, p, t, and s, etc.),
and the objective is to get a much smaller dataframe and the unique 
values for ID is 1409 with the minimum and maximum pressure for each
profile.  The slow part is:

h.maxmin <- by(p.1,p.1$id,function(x){
             data.frame(id=x$id[1],
                      maxp=max(x$p),
                      minp=min(x$p))})

2.  Even with unneeded data objects removed, this is very slow.  Is
there a faster way to get the maximum and minimum values?

platform sparc-sun-solaris2.9
arch     sparc               
os       solaris2.9          
system   sparc, solaris2.9   
status                       
major    1                   
minor    7.0                 
year     2003                
month    04                  
day      16                  
language R             

Thank you for your time.

Helen

Douglas Bates

2003-Jul-01 14:31 UTC

head link

[R] Computations slow in spite of large amounts of RAM.

"Huiqin Yang" <Huiqin.Yang at noaa.gov> writes:
> Hi all,
> 
> I am a beginner trying to use R to work with large amounts of
> oceanographic data, and I find that computations can be VERY slow.  In
> particular, computational speed seems to depend strongly on the number
> and size of the objects that are loaded (when R starts up).  The same
> computations are significantly faster when all but the essential
> objects are removed.  I am running R on a machine with 16 GB of RAM,
> and our unix system manager assures me that there is memory available
> to my R process that has not been used.
> 
> 1.  Is the problem associated with how R uses memory?  If so, is there
> some way to increase the amount of memory used by my R process to get
> better performance?
You could try setting a large nsize and vsize using 

 mem.limits

See the description in ?Memory
> The computations that are particularly slow involve looping with
> by().  The data are measurements of vertical profiles of pressure,
> temperature, and salinity at a number of stations, which are organized
> into a dataframe p.1 (1925930 rows, 8 columns: id, p, t, and s, etc.),
> and the objective is to get a much smaller dataframe and the unique 
> values for ID is 1409 with the minimum and maximum pressure for each
> profile.  The slow part is:
> 
> h.maxmin <- by(p.1,p.1$id,function(x){
>              data.frame(id=x$id[1],
>                       maxp=max(x$p),
>                       minp=min(x$p))})
I think it would be faster to use

h.maxmin <- tapply(p.1$p, p.1$id, range)

In the call to by you are subsetting the entire data frame and that
probably means taking at least one copy of that frame.  If you use
tapply on only the relevant columns you will use much less space.
> 2.  Even with unneeded data objects removed, this is very slow.  Is
> there a faster way to get the maximum and minimum values?
See above.


-- 
Douglas Bates                            bates at stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/

Liaw, Andy

2003-Jul-01 14:37 UTC

head link

[R] Computations slow in spite of large amounts of RAM.

> From: Huiqin Yang [mailto:Huiqin.Yang at noaa.gov] 
> 
> Hi all,
> 
> I am a beginner trying to use R to work with large amounts of 
> oceanographic data, and I find that computations can be VERY 
> slow.  In particular, computational speed seems to depend 
> strongly on the number and size of the objects that are 
> loaded (when R starts up).  The same computations are 
> significantly faster when all but the essential objects are 
> removed.  I am running R on a machine with 16 GB of RAM, and 
> our unix system manager assures me that there is memory 
> available to my R process that has not been used.
> 
> 1.  Is the problem associated with how R uses memory?  If so, 
> is there some way to increase the amount of memory used by my 
> R process to get better performance?
Is R compiled as 64-bit?  If not, it won't be able to use more than 4GB of
RAM (that's my understanding, anyway).

R keeps objects in memory, so if you are working with large amount of data,
it's a good habit to keep only the absolute essential objects in the
workspace, and save() and rm() things you don't need for the computation.
> 
> The computations that are particularly slow involve looping 
> with by().  The data are measurements of vertical profiles of 
> pressure, temperature, and salinity at a number of stations, 
> which are organized into a dataframe p.1 (1925930 rows, 8 
> columns: id, p, t, and s, etc.), and the objective is to get 
> a much smaller dataframe and the unique 
> values for ID is 1409 with the minimum and maximum pressure 
> for each profile.  The slow part is:
> 
> h.maxmin <- by(p.1,p.1$id,function(x){
>              data.frame(id=x$id[1],
>                       maxp=max(x$p),
>                       minp=min(x$p))})
> 
> 2.  Even with unneeded data objects removed, this is very 
> slow.  Is there a faster way to get the maximum and minimum values?
Why do you need to use by(), and why have the function return a data frame
containing only one row?  Here's an experiment on my 900MHz PIII laptop:
> n <- 1e5
> dat <- data.frame(id = sort(sample(LETTERS, n, replace=TRUE)),
+                   p = rnorm(n))> 
> 
> system.time(h.maxmin <- by(dat, dat$id,function(x) {+   data.frame(id=x$id[1], maxp=max(x$p), minp=min(x$p))}))
[1] 2.75 0.01 2.78   NA   NA> system.time(junk <- tapply(dat$p, dat$id, function(x) range(x)))[1] 0.12 0.01 0.13   NA   NA

If you want to coerce the result to a data frame with id as row names and
min and max as the two variables, you can do:

  junk.dat <- as.data.frame(do.call("rbind", junk))

HTH,
Andy


 > platform sparc-sun-solaris2.9
> arch     sparc               
> os       solaris2.9          
> system   sparc, solaris2.9   
> status                       
> major    1                   
> minor    7.0                 
> year     2003                
> month    04                  
> day      16                  
> language R             
> 
> Thank you for your time.
> 
> Helen
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
> 
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, ...{{dropped}}

Maybe Matching Threads

Search for more reasonably related threads

R help - Jul 2003 - Computations slow in spite of large amounts of RAM.

[R] Computations slow in spite of large amounts of RAM.

[R] Computations slow in spite of large amounts of RAM.

[R] Computations slow in spite of large amounts of RAM.

Maybe Matching Threads