thr3ads.net - R help - [R] RAM usage [Oct 2002]

If this information is useful, please help other people find it:
Share via:

Vaidotas Zemlys

2002-Oct-18 10:27 UTC

[R] RAM usage

Hi,

I'm having problems while working with large data sets with R 1.5.1 in
windows 2000. Given a integer matrix size of 30 columns and  15000 rows
my function should return a boolean matrix size of about 5000 rows and
15000 columns.

First of all I tried to run this function on computer with 256 MB of 
RAM. I increased memory limit of R with memory.limit() up to 512 MB. I 
was inspecting memory and processor usage through Windows  task manager. 
At first R was using 100% of processor and memory usage was constantly 
increasing. When there were no physical memory left, R began using 
virtual memory. Then the processor usage dropped, but there was 
instensive work with hard drive. Of course that slowed down 
calculations. Yet the memory used by R always changed, and that was I
think the  sign, that R was calculating. But after a while the task
manager showed that R uses constant size of memory. The Rgui was not
responding, so I assumed that R crashed.

So I tried to run the calculations on another win2k box with 1024 MB of 
RAM with the same R version 1.5.1.  This time virtual memory was not 
used, yet still R froze. The memory usage grew to about 450 MB and then 
R stopped. Memory usage was not changing, Rgui did not respond, yet 
processor was used 100%. Task manager showed that peak memory usage was 
about 760 MB.

On smaller data sets there were no problems, memory usage was constantly 
  increasing and processor was used 100%. My function does not use fancy 
functions. Basically it just sums, finds minimum and maximum, uses 
subseting of a matrix, and calculates correlation matrix between 10 
columns of a given matrix.

So I would like to ask can R at all perform such calculations where a 
lot of memory must be used? And if R can do such calculations, what are 
specific problems, topics or tips which should be known before letting R 
to do these calculations?

Thanks in advance, for any help and special thanks for reading such a 
long letter.

Vaidotas Zemlys

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Vaidotas Zemlys

2002-Oct-18 12:21 UTC

head link

[R] RAM usage

Hi,

  > You'll need to post the code for anyone to be able to help. There are
  >  many ways to do the same thing, some hugely more efficient than
  > others.
  >

Ok, here it is:

#main function
rptree <-
function(X,y,depth=3,count=10) {
      X <- as.matrix(X)
      y <- as.vector(y)

      m <- dim(X)[1]
      n <- dim(X)[2]

      if(!identical(m,length(y))) {
          stop("Nesutampa dimensijos")
      }

      cnames <- colnames(X)
      tree <- list()
      snames <- list()
      node <- node.div(X,y,count=count,sep="",col.names=cnames)
      l <- node$l
      paths <- node$paths
      rownames(l) <- node$snames -> rownames(paths)
      tree[[1]] <- list(subsets=l,paths=paths)
      snames[[1]] <- node$snames

      if(depth>1) {
          for(i in 2:depth) {
              nl <- dim(tree[[i-1]]$subsets)[1]
              off <-0
              l <- logical();
              paths <- numeric();
              l.snames <- character();
              for(j in 1:nl) {
                  subs <- tree[[i-1]]$subsets[j,]
                  node <-
node.div(X[subs,],y[subs],count=count,name=snames[[i-1]][j],col.names=cnames)
                  if(node$bonferoni>0) {
                      nnl <- dim(node$l)[1]
                      subss <- matrix(rep(subs,nnl),nrow=nnl,byrow=TRUE)
                      subss[subss] <- node$l
                      l <- rbind(l,subss)
                      paths <-
rbind(paths,cbind(matrix(rep(tree[[i-1]]$paths[j,],nnl),nrow=nnl,byrow=TRUE),
node$paths))
                      l.snames[off+1:nnl] <- node$snames
                      off <- off + nnl
                  }
              }
              rownames(l) <- l.snames -> rownames(paths)
              tree[[i]] <- list(subsets=l,paths=paths)
              snames[[i]] <- l.snames
          }

      }
      names(tree) <- paste("lv",1:depth,sep="")
      tree$X <- X
      tree$y <- y

      attributes(tree)$class <- "rptree"
      tree

}

#function node.div used in main function rptree
node.div <-
function(X,y,count=10,name="subset",sep=".",col.names=NULL)
{

      m <- dim(X)[1]
      n <- dim(X)[2]

      SZZ=sum(y^2)
      SZ=sum(y)

      t <- rep(0,n)
      for(i in 1:n) {
          n1_length(X[X[,i]==0,i])
          if((n1>10) && (n1<(m-10))) {
              if(min(c(n1,m-n1)==n1)) {
                  SX <- sum(y[X[,i]==0])
                  SXX <- sum(y[X[,i]==0]^2)
                  n2 <- m-n1
              }
              else {
                  SX <- sum(y[X[,i]>0])
                  SXX <- sum(y[X[,i]>0]^2)
                  n2 <- n1
                  n1 <- m-n1
              }
              SY <- SZ-SX;
              SYY <- SZZ-SXX;

              SSX <- SXX-(1/n1)*(SX)^2
              SSY <- SYY-(1/n2)*(SY)^2
              v <- (SSX+SSY)/(m-2)
              stderr <- sqrt(v*(1/n1+1/n2))
              t[i] <- abs(SX/n1-SY/n2)/stderr
          }
      }

      #t <- t[t>0]
      bonf <- length(t[t>0])
      ind <- rep(0,count)
      if(bonf>1) {
          st <- sort(t,decreasing=TRUE,index.return=TRUE)
          j <- 1
          jj <- 1
          ind[1] <- st$ix[1]
          q.value <- qt(0.975,m-2)
          while((j<count) && (st$x[jj+1]>q.value) &&
(j<n)) {
              max.cor <- max(abs(cor(X[,c(ind,st$ix[jj+1])])[j+1,1:j]))

              if(max.cor<0.9) {
                  j <- j + 1
                  jj <- jj + 1
                  ind[j]_st$ix[jj]
              }
              else {
                  jj <- jj + 1
              }
          }
      }
      else {
          if(bonf==1) {
              ind[1]_(1:n)[t>0]
          }
      }
      ind <- ind[ind>0]
      ni <- length(ind)
      if(ni>0) {
          l_X[,ind]>0
          l <- t(matrix(c(l,!l),nrow=m))
          paths <- cbind(rep(ind,2),rep(c(1,0),each=ni))
          if(identical(col.names,NULL)) {
              snames <-
paste(name,paste(rep(ind,2),rep(c(1,0),each=ni),sep="@"),sep=sep)
          }
          else {
              snames <-
paste(name,paste(rep(col.names[ind],2),rep(c(1,0),each=ni),sep="@"),sep=sep)
          }
          list(l=l,paths=paths,snames=snames,bonferoni=bonf)
      }
      else {
          list(bonferoni=bonf)
      }
}


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

ripley@stats.ox.ac.uk

2002-Oct-18 13:03 UTC

head link

[R] RAM usage

On Fri, 18 Oct 2002, Vaidotas Zemlys wrote:
> I'm having problems while working with large data sets with R 1.5.1 in
> windows 2000. Given a integer matrix size of 30 columns and  15000 rows
> my function should return a boolean matrix size of about 5000 rows and
> 15000 columns.
That's  75million items of 4bytes each, hence almost 300Mb for that one
object.
> First of all I tried to run this function on computer with 256 MB of
> RAM. I increased memory limit of R with memory.limit() up to 512 MB. I
> was inspecting memory and processor usage through Windows  task manager.
> At first R was using 100% of processor and memory usage was constantly
> increasing. When there were no physical memory left, R began using
> virtual memory. Then the processor usage dropped, but there was
> instensive work with hard drive. Of course that slowed down
> calculations. Yet the memory used by R always changed, and that was I
> think the  sign, that R was calculating. But after a while the task
> manager showed that R uses constant size of memory. The Rgui was not
> responding, so I assumed that R crashed.
Don't think so.  More likely that Windows is having problems managing the
memory requirements.  You are trying to access an object too big to fit
into RAM, and that going to cause severe strain.
> So I tried to run the calculations on another win2k box with 1024 MB of
> RAM with the same R version 1.5.1.  This time virtual memory was not
> used, yet still R froze. The memory usage grew to about 450 MB and then
> R stopped. Memory usage was not changing, Rgui did not respond, yet
> processor was used 100%. Task manager showed that peak memory usage was
> about 760 MB.
Again, there is likely a problem with Windows allocating a contiguous
chunk of 300Mb of memory.  Try this sort of thing only after a fresh
reboot.
> On smaller data sets there were no problems, memory usage was constantly
>   increasing and processor was used 100%. My function does not use fancy
> functions. Basically it just sums, finds minimum and maximum, uses
> subseting of a matrix, and calculates correlation matrix between 10
> columns of a given matrix.
>
> So I would like to ask can R at all perform such calculations where a
> lot of memory must be used? And if R can do such calculations, what are
> specific problems, topics or tips which should be known before letting R
> to do these calculations?
R can. The question is `can Windows'?  If possible use a Unix-based OS.

You have not told us your problem, so has not demonstrated that
`a lot of memory must be used'.   Hard to help when we don't know what
you
are attempting, but few problems cannot be done in pieces.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  stats.ox.ac.uk/~ripley
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Vaidotas Zemlys

2002-Oct-18 13:50 UTC

head link

[R] RAM usage

Hi,
> Dr. Zemlys: Have you tried with --max-mem-size option in the R command 
> line? Here is an excerpt from the FAQ file 2.6 There seems to be a limit 
> on the memory it uses! Indeed there is. It is set by the command-line 
> flag |--max-mem-size| (see How do I install R for Windows? 
> <cid:part1.06010309.09030302 at perseus.unalmed.edu.co>) and defaults
to
> the smaller of the amount of physical RAM in the machine and 1Gb. It can 
> be set to any amount over 10M. (R will not run in less.) Be aware though 
> that Windows has (in most versions) a maximum amount of user virtual 
> memory of 2Gb, and parts of this can be reserved by processes but not 
> used. Because of the way the memory manager works, it is possible that 
> there will be free memory but R will not be able to make use of it. Use 
> |?Memory| and |?memory.size| for information about memory usage. The 
> limit can be raised by calling |memory.limit| within a running R 
> session. We have found that starting R with too large a value of 
> |--max-mem-size| may fail: the limit seemed to be about 1.7Gb on Windows 
> 2000 Professional. R can be compiled to use a different memory manager 
> which might be better at using large amounts of memory, but is 
> substantially slower (making R several times slower on some tasks). -- 
When I tried to run my function on computer with 1 GB of RAM, I set 
memory.limit(1024), yet R froze when it hasn't had reached that limit. 
Windows task manager showed that R is using about 450 MB of RAM at the time 
when it froze. When I tried to run calculations without adjusting 
memory.limit, R exited from function with error message, that I should 
adjust memory limit, because it cannot allocate vector of some size.

So I think the problem is not with the memory limits.

Vaidotas Zemlys



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

John Aitchison

2002-Oct-19 07:49 UTC

head link

[R] RAM usage

On 18 Oct 2002, at 14:03, ripley at stats.ox.ac.uk wrote:
> On Fri, 18 Oct 2002, Vaidotas Zemlys wrote:
> 
> > I'm having problems while working with large data sets with R
1.5.1
> > in windows 2000. Given a integer matrix size of 30 columns and 
> > 15000 rows my function should return a boolean matrix size of about
> > 5000 rows and 15000 columns.
snip> 
> 
> Don't think so.  More likely that Windows is having problems managing
> the memory requirements.  You are trying to access an object too big
> to fit into RAM, and that going to cause severe strain.
snip> 
> Again, there is likely a problem with Windows allocating a contiguous
> chunk of 300Mb of memory.  Try this sort of thing only after a fresh
> reboot.
> snip
> 
> R can. The question is `can Windows'?  If possible use a Unix-based
> OS.
Windows leaves a LOT of junk lying around in RAM (recently used 
DLL's etc)  and even after a reboot there is still some reclaimable 
RAM (from processes used in startup). 

I use a program called MemTurbo memturbo.com
which will do a ram scrub (releasing unused ram when free ram 
limits are reached) and a ram defrag (so if contiguous ram is 
needed then this might help). It has been around a while, I have 
used it since an earlier version .. I don't believe it is perfect but it 
does seem to do a good job of the arcane area of windows memory 
management. 

Maybe that will help to get you "clean ram".

The other area that might be worth attention is the number of 
processes that are currently running .. you can perhaps kill some 
of these. And, of course defragging your hard disk(s) and perhaps 
managing the swap file yourself are old favourites, not to mention 
cleaning the registry .. none of these should in theory have 
anything to do with memory management (by R), but in practice 
there seem to be some complex "interactions" in the OS, between 
the OS and the registry and RAM and concurrent threads. 

I have also found that a "lightly loaded"  Windows machine (one 
with very few programs installed) is much more likely to be stable 
than one with many programs installed, and I have a glimmering of 
an idea that there is some critical size of the registry beyond which 
something starts to thrash (? if the registry size is greater than 
available physical RAM) . Of course none of this registry business  
SHOULD affect R, but then again, Windoze is a black box, so who 
knows what goes on with program loading, thread interaction etc 
etc

To cut a long story short, it might just possibly help if you try to 
keep your ram and your disks and swapfiles and registry as clean 
as possible.

fwiw

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Vaidotas Zemlys

2002-Oct-21 10:37 UTC

head link

[R] RAM usage

Hi,


 >> I'm having problems while working with large data sets with R
1.5.1 in
 >> windows 2000. Given a integer matrix size of 30 columns and  15000
rows
 >> my function should return a boolean matrix size of about 5000 rows and
 >> 15000 columns.

 >That's  75million items of 4bytes each, hence almost 300Mb for that one
 >object.

Does that mean that R reserves 4 bytes for logical object with length 1? On 
the whole how much memory R allocates for different data types? I searched 
a bit, but I didn't find anything useful on this subject. I would like to 
know if it is possible how much memory R needs for storing for example real 
matrix size of 50 rows and 100 columns together with column and row names. 
Or where to find information on such subject.

I thought that R needs only 1 byte for storing logical byte, and because of 
that I underestimated the size of memory R would need to use.


 > You have not told us your problem, so has not demonstrated that
 > `a lot of memory must be used'.   Hard to help when we don't know
what
  > you are attempting, but few problems cannot be done in pieces

I did not tell my problem, because I thought that it was more or less 
irrelevant to the memory usage problems I was experiencing. My intention 
was to ask about how R manages memory and is there something special about 
that management everyone should know, but I don't know. I'm sorry if my 
letter was a bit unclear, English is not my native language.

As for my problem, I'm trying to find out how well recursive partitioning 
could separate a "pure" subset. In recursive partitioning (and all
tree
methods) the tree is grown using the splits, that separates node into two 
subsets best. Thus given set is divided into subsets minimizing broadly 
speaking some statistic, which depends on all subsets. My goal is to 
single out one "pure" subset, I don't care about other subsets, so
clearly
I do not want to minimize some statistic which depends on all subsets. So I 
try to grow trees using not only the splits that are best, but the splits 
that are nearly best as well. To be exact I use 10 best splits for every 
node. So if I split the root node twice I get 1000 trees. I have to save 
information about terminal nodes, that is what objects do belong to it. As 
these objects are elements of a given vector y, for each terminal node I 
save the logical vector length of a given vector where TRUE in position i 
means that element y[i] is present in that terminal node.

To sum up I have a initial matrix X where dim(X)[1]==m, dim(X)[2]==n, 
vector y, length(y)==m, and I do splitting of y upon the columns of X. For 
each terminal node I save a logical vector t, length(t)==length(y)==m, 
where t[i]==TRUE for some i, means that y[i] belongs to terminal node t. 
With 1000 trees I can have maximum 4000 terminal nodes, so I need to store 
4000*m logical items. As you can understand from my previous letters, I 
encountered problems, when m is about 15000.

I'm trying to grow these trees purely for exploratory reasons, it may be 
that my mathematical and statistical assumptions can be totally wrong, so 
that's why I did not give much details about my problem earlier.

Thanks for all your answers.

Vaidotas Zemlys

PS R rulezzz!!! :)








-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Liaw, Andy

2002-Oct-21 12:23 UTC

head link

[R] RAM usage

> From: Vaidotas Zemlys [mailto:mpiktas at delfi.lt]
> Hi,
> 
>  >> I'm having problems while working with large data sets 
> with R 1.5.1 in
>  >> windows 2000. Given a integer matrix size of 30 columns 
> and  15000 rows
>  >> my function should return a boolean matrix size of about 
> 5000 rows and
>  >> 15000 columns.
> 
>  >That's  75million items of 4bytes each, hence almost 300Mb 
> for that one
>  >object.
> 
> Does that mean that R reserves 4 bytes for logical object 
> with length 1? On 
> the whole how much memory R allocates for different data 
> types? I searched 
> a bit, but I didn't find anything useful on this subject. I 
> would like to 
> know if it is possible how much memory R needs for storing 
> for example real 
> matrix size of 50 rows and 100 columns together with column 
> and row names. 
> Or where to find information on such subject.
> 
> I thought that R needs only 1 byte for storing logical byte, 
> and because of 
> that I underestimated the size of memory R would need to use.
You can use object.size() to get some idea on how R allocates memory:
> object.size(logical(1000))/1000
[1] 4.028> object.size(integer(1000))/1000
[1] 4.028> object.size(double(1000))/1000[1] 8.028

So it seems like R allocates logicals as if they are integers.  My guess is
that this makes it easier to coerce logicals to integers for things like
sum(is.na(x))?

Andy


------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information
of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be
confidential, proprietary copyrighted and/or legally privileged, and is intended
solely for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error, please
immediately return this by e-mail and then delete it.

=============================================================================
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Vaidotas Zemlys

2002-Oct-22 09:33 UTC

head link

[R] RAM usage

Hi,

Brian D. Ripley wrote:

 >> I did not tell my problem, because I thought that it was more or less
 >> irrelevant to the memory usage problems I was experiencing. My
intention
 >> was to ask about how R manages memory and is there something special
>about
 >> that management everyone should know, but I don't know. I'm
sorry if my
 >> letter was a bit unclear, English is not my native language.

 > But you did claim `must be used'.  That really is rarely the case, and
 > the skill in programming R (or S) is use memory within the resources
 > available.

Yes I did write 'must be used', yet I did not want to claim anything. 
Really, I did not want to express so strictly. I should of used 'is
used',
or something else. As I said English is not my native language:)

I tried to run the same calculations with initial matrix size of 7500 rows 
and 70 columns, on Linux machine as you suggested.  Debian Woody, R version 
1.5.1 with 256 MB RAM and 256 MB Swap, everything worked fine, not like on 
Windows 2000 Professional machine with 256 RAM. It seems that Win2k is 
really missing something with memory management, or R windows build is 
somewhat different from linux build?

Thanks for everybody's answers

Vaidotas Zemlys

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more seemingly similar threads

R help - Oct 2002 - RAM usage

[R] RAM usage

[R] RAM usage

[R] RAM usage

[R] RAM usage

[R] RAM usage

[R] RAM usage

[R] RAM usage

[R] RAM usage

Maybe Matching Threads