Hi, I'm having problems while working with large data sets with R 1.5.1 in windows 2000. Given a integer matrix size of 30 columns and 15000 rows my function should return a boolean matrix size of about 5000 rows and 15000 columns. First of all I tried to run this function on computer with 256 MB of RAM. I increased memory limit of R with memory.limit() up to 512 MB. I was inspecting memory and processor usage through Windows task manager. At first R was using 100% of processor and memory usage was constantly increasing. When there were no physical memory left, R began using virtual memory. Then the processor usage dropped, but there was instensive work with hard drive. Of course that slowed down calculations. Yet the memory used by R always changed, and that was I think the sign, that R was calculating. But after a while the task manager showed that R uses constant size of memory. The Rgui was not responding, so I assumed that R crashed. So I tried to run the calculations on another win2k box with 1024 MB of RAM with the same R version 1.5.1. This time virtual memory was not used, yet still R froze. The memory usage grew to about 450 MB and then R stopped. Memory usage was not changing, Rgui did not respond, yet processor was used 100%. Task manager showed that peak memory usage was about 760 MB. On smaller data sets there were no problems, memory usage was constantly increasing and processor was used 100%. My function does not use fancy functions. Basically it just sums, finds minimum and maximum, uses subseting of a matrix, and calculates correlation matrix between 10 columns of a given matrix. So I would like to ask can R at all perform such calculations where a lot of memory must be used? And if R can do such calculations, what are specific problems, topics or tips which should be known before letting R to do these calculations? Thanks in advance, for any help and special thanks for reading such a long letter. Vaidotas Zemlys -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, > You'll need to post the code for anyone to be able to help. There are > many ways to do the same thing, some hugely more efficient than > others. > Ok, here it is: #main function rptree <- function(X,y,depth=3,count=10) { X <- as.matrix(X) y <- as.vector(y) m <- dim(X)[1] n <- dim(X)[2] if(!identical(m,length(y))) { stop("Nesutampa dimensijos") } cnames <- colnames(X) tree <- list() snames <- list() node <- node.div(X,y,count=count,sep="",col.names=cnames) l <- node$l paths <- node$paths rownames(l) <- node$snames -> rownames(paths) tree[[1]] <- list(subsets=l,paths=paths) snames[[1]] <- node$snames if(depth>1) { for(i in 2:depth) { nl <- dim(tree[[i-1]]$subsets)[1] off <-0 l <- logical(); paths <- numeric(); l.snames <- character(); for(j in 1:nl) { subs <- tree[[i-1]]$subsets[j,] node <- node.div(X[subs,],y[subs],count=count,name=snames[[i-1]][j],col.names=cnames) if(node$bonferoni>0) { nnl <- dim(node$l)[1] subss <- matrix(rep(subs,nnl),nrow=nnl,byrow=TRUE) subss[subss] <- node$l l <- rbind(l,subss) paths <- rbind(paths,cbind(matrix(rep(tree[[i-1]]$paths[j,],nnl),nrow=nnl,byrow=TRUE), node$paths)) l.snames[off+1:nnl] <- node$snames off <- off + nnl } } rownames(l) <- l.snames -> rownames(paths) tree[[i]] <- list(subsets=l,paths=paths) snames[[i]] <- l.snames } } names(tree) <- paste("lv",1:depth,sep="") tree$X <- X tree$y <- y attributes(tree)$class <- "rptree" tree } #function node.div used in main function rptree node.div <- function(X,y,count=10,name="subset",sep=".",col.names=NULL) { m <- dim(X)[1] n <- dim(X)[2] SZZ=sum(y^2) SZ=sum(y) t <- rep(0,n) for(i in 1:n) { n1_length(X[X[,i]==0,i]) if((n1>10) && (n1<(m-10))) { if(min(c(n1,m-n1)==n1)) { SX <- sum(y[X[,i]==0]) SXX <- sum(y[X[,i]==0]^2) n2 <- m-n1 } else { SX <- sum(y[X[,i]>0]) SXX <- sum(y[X[,i]>0]^2) n2 <- n1 n1 <- m-n1 } SY <- SZ-SX; SYY <- SZZ-SXX; SSX <- SXX-(1/n1)*(SX)^2 SSY <- SYY-(1/n2)*(SY)^2 v <- (SSX+SSY)/(m-2) stderr <- sqrt(v*(1/n1+1/n2)) t[i] <- abs(SX/n1-SY/n2)/stderr } } #t <- t[t>0] bonf <- length(t[t>0]) ind <- rep(0,count) if(bonf>1) { st <- sort(t,decreasing=TRUE,index.return=TRUE) j <- 1 jj <- 1 ind[1] <- st$ix[1] q.value <- qt(0.975,m-2) while((j<count) && (st$x[jj+1]>q.value) && (j<n)) { max.cor <- max(abs(cor(X[,c(ind,st$ix[jj+1])])[j+1,1:j])) if(max.cor<0.9) { j <- j + 1 jj <- jj + 1 ind[j]_st$ix[jj] } else { jj <- jj + 1 } } } else { if(bonf==1) { ind[1]_(1:n)[t>0] } } ind <- ind[ind>0] ni <- length(ind) if(ni>0) { l_X[,ind]>0 l <- t(matrix(c(l,!l),nrow=m)) paths <- cbind(rep(ind,2),rep(c(1,0),each=ni)) if(identical(col.names,NULL)) { snames <- paste(name,paste(rep(ind,2),rep(c(1,0),each=ni),sep="@"),sep=sep) } else { snames <- paste(name,paste(rep(col.names[ind],2),rep(c(1,0),each=ni),sep="@"),sep=sep) } list(l=l,paths=paths,snames=snames,bonferoni=bonf) } else { list(bonferoni=bonf) } } -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 18 Oct 2002, Vaidotas Zemlys wrote:> I'm having problems while working with large data sets with R 1.5.1 in > windows 2000. Given a integer matrix size of 30 columns and 15000 rows > my function should return a boolean matrix size of about 5000 rows and > 15000 columns.That's 75million items of 4bytes each, hence almost 300Mb for that one object.> First of all I tried to run this function on computer with 256 MB of > RAM. I increased memory limit of R with memory.limit() up to 512 MB. I > was inspecting memory and processor usage through Windows task manager. > At first R was using 100% of processor and memory usage was constantly > increasing. When there were no physical memory left, R began using > virtual memory. Then the processor usage dropped, but there was > instensive work with hard drive. Of course that slowed down > calculations. Yet the memory used by R always changed, and that was I > think the sign, that R was calculating. But after a while the task > manager showed that R uses constant size of memory. The Rgui was not > responding, so I assumed that R crashed.Don't think so. More likely that Windows is having problems managing the memory requirements. You are trying to access an object too big to fit into RAM, and that going to cause severe strain.> So I tried to run the calculations on another win2k box with 1024 MB of > RAM with the same R version 1.5.1. This time virtual memory was not > used, yet still R froze. The memory usage grew to about 450 MB and then > R stopped. Memory usage was not changing, Rgui did not respond, yet > processor was used 100%. Task manager showed that peak memory usage was > about 760 MB.Again, there is likely a problem with Windows allocating a contiguous chunk of 300Mb of memory. Try this sort of thing only after a fresh reboot.> On smaller data sets there were no problems, memory usage was constantly > increasing and processor was used 100%. My function does not use fancy > functions. Basically it just sums, finds minimum and maximum, uses > subseting of a matrix, and calculates correlation matrix between 10 > columns of a given matrix. > > So I would like to ask can R at all perform such calculations where a > lot of memory must be used? And if R can do such calculations, what are > specific problems, topics or tips which should be known before letting R > to do these calculations?R can. The question is `can Windows'? If possible use a Unix-based OS. You have not told us your problem, so has not demonstrated that `a lot of memory must be used'. Hard to help when we don't know what you are attempting, but few problems cannot be done in pieces. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi,> Dr. Zemlys: Have you tried with --max-mem-size option in the R command > line? Here is an excerpt from the FAQ file 2.6 There seems to be a limit > on the memory it uses! Indeed there is. It is set by the command-line > flag |--max-mem-size| (see How do I install R for Windows? > <cid:part1.06010309.09030302 at perseus.unalmed.edu.co>) and defaults to > the smaller of the amount of physical RAM in the machine and 1Gb. It can > be set to any amount over 10M. (R will not run in less.) Be aware though > that Windows has (in most versions) a maximum amount of user virtual > memory of 2Gb, and parts of this can be reserved by processes but not > used. Because of the way the memory manager works, it is possible that > there will be free memory but R will not be able to make use of it. Use > |?Memory| and |?memory.size| for information about memory usage. The > limit can be raised by calling |memory.limit| within a running R > session. We have found that starting R with too large a value of > |--max-mem-size| may fail: the limit seemed to be about 1.7Gb on Windows > 2000 Professional. R can be compiled to use a different memory manager > which might be better at using large amounts of memory, but is > substantially slower (making R several times slower on some tasks). --When I tried to run my function on computer with 1 GB of RAM, I set memory.limit(1024), yet R froze when it hasn't had reached that limit. Windows task manager showed that R is using about 450 MB of RAM at the time when it froze. When I tried to run calculations without adjusting memory.limit, R exited from function with error message, that I should adjust memory limit, because it cannot allocate vector of some size. So I think the problem is not with the memory limits. Vaidotas Zemlys -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 18 Oct 2002, at 14:03, ripley at stats.ox.ac.uk wrote:> On Fri, 18 Oct 2002, Vaidotas Zemlys wrote: > > > I'm having problems while working with large data sets with R 1.5.1 > > in windows 2000. Given a integer matrix size of 30 columns and > > 15000 rows my function should return a boolean matrix size of about > > 5000 rows and 15000 columns.snip> > > Don't think so. More likely that Windows is having problems managing > the memory requirements. You are trying to access an object too big > to fit into RAM, and that going to cause severe strain.snip> > Again, there is likely a problem with Windows allocating a contiguous > chunk of 300Mb of memory. Try this sort of thing only after a fresh > reboot. >snip> > R can. The question is `can Windows'? If possible use a Unix-based > OS.Windows leaves a LOT of junk lying around in RAM (recently used DLL's etc) and even after a reboot there is still some reclaimable RAM (from processes used in startup). I use a program called MemTurbo http://www.memturbo.com/ which will do a ram scrub (releasing unused ram when free ram limits are reached) and a ram defrag (so if contiguous ram is needed then this might help). It has been around a while, I have used it since an earlier version .. I don't believe it is perfect but it does seem to do a good job of the arcane area of windows memory management. Maybe that will help to get you "clean ram". The other area that might be worth attention is the number of processes that are currently running .. you can perhaps kill some of these. And, of course defragging your hard disk(s) and perhaps managing the swap file yourself are old favourites, not to mention cleaning the registry .. none of these should in theory have anything to do with memory management (by R), but in practice there seem to be some complex "interactions" in the OS, between the OS and the registry and RAM and concurrent threads. I have also found that a "lightly loaded" Windows machine (one with very few programs installed) is much more likely to be stable than one with many programs installed, and I have a glimmering of an idea that there is some critical size of the registry beyond which something starts to thrash (? if the registry size is greater than available physical RAM) . Of course none of this registry business SHOULD affect R, but then again, Windoze is a black box, so who knows what goes on with program loading, thread interaction etc etc To cut a long story short, it might just possibly help if you try to keep your ram and your disks and swapfiles and registry as clean as possible. fwiw -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, >> I'm having problems while working with large data sets with R 1.5.1 in >> windows 2000. Given a integer matrix size of 30 columns and 15000 rows >> my function should return a boolean matrix size of about 5000 rows and >> 15000 columns. >That's 75million items of 4bytes each, hence almost 300Mb for that one >object. Does that mean that R reserves 4 bytes for logical object with length 1? On the whole how much memory R allocates for different data types? I searched a bit, but I didn't find anything useful on this subject. I would like to know if it is possible how much memory R needs for storing for example real matrix size of 50 rows and 100 columns together with column and row names. Or where to find information on such subject. I thought that R needs only 1 byte for storing logical byte, and because of that I underestimated the size of memory R would need to use. > You have not told us your problem, so has not demonstrated that > `a lot of memory must be used'. Hard to help when we don't know what > you are attempting, but few problems cannot be done in pieces I did not tell my problem, because I thought that it was more or less irrelevant to the memory usage problems I was experiencing. My intention was to ask about how R manages memory and is there something special about that management everyone should know, but I don't know. I'm sorry if my letter was a bit unclear, English is not my native language. As for my problem, I'm trying to find out how well recursive partitioning could separate a "pure" subset. In recursive partitioning (and all tree methods) the tree is grown using the splits, that separates node into two subsets best. Thus given set is divided into subsets minimizing broadly speaking some statistic, which depends on all subsets. My goal is to single out one "pure" subset, I don't care about other subsets, so clearly I do not want to minimize some statistic which depends on all subsets. So I try to grow trees using not only the splits that are best, but the splits that are nearly best as well. To be exact I use 10 best splits for every node. So if I split the root node twice I get 1000 trees. I have to save information about terminal nodes, that is what objects do belong to it. As these objects are elements of a given vector y, for each terminal node I save the logical vector length of a given vector where TRUE in position i means that element y[i] is present in that terminal node. To sum up I have a initial matrix X where dim(X)[1]==m, dim(X)[2]==n, vector y, length(y)==m, and I do splitting of y upon the columns of X. For each terminal node I save a logical vector t, length(t)==length(y)==m, where t[i]==TRUE for some i, means that y[i] belongs to terminal node t. With 1000 trees I can have maximum 4000 terminal nodes, so I need to store 4000*m logical items. As you can understand from my previous letters, I encountered problems, when m is about 15000. I'm trying to grow these trees purely for exploratory reasons, it may be that my mathematical and statistical assumptions can be totally wrong, so that's why I did not give much details about my problem earlier. Thanks for all your answers. Vaidotas Zemlys PS R rulezzz!!! :) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> From: Vaidotas Zemlys [mailto:mpiktas at delfi.lt] > Hi, > > >> I'm having problems while working with large data sets > with R 1.5.1 in > >> windows 2000. Given a integer matrix size of 30 columns > and 15000 rows > >> my function should return a boolean matrix size of about > 5000 rows and > >> 15000 columns. > > >That's 75million items of 4bytes each, hence almost 300Mb > for that one > >object. > > Does that mean that R reserves 4 bytes for logical object > with length 1? On > the whole how much memory R allocates for different data > types? I searched > a bit, but I didn't find anything useful on this subject. I > would like to > know if it is possible how much memory R needs for storing > for example real > matrix size of 50 rows and 100 columns together with column > and row names. > Or where to find information on such subject. > > I thought that R needs only 1 byte for storing logical byte, > and because of > that I underestimated the size of memory R would need to use.You can use object.size() to get some idea on how R allocates memory:> object.size(logical(1000))/1000[1] 4.028> object.size(integer(1000))/1000[1] 4.028> object.size(double(1000))/1000[1] 8.028 So it seems like R allocates logicals as if they are integers. My guess is that this makes it easier to coerce logicals to integers for things like sum(is.na(x))? Andy ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, Brian D. Ripley wrote: >> I did not tell my problem, because I thought that it was more or less >> irrelevant to the memory usage problems I was experiencing. My intention >> was to ask about how R manages memory and is there something special >about >> that management everyone should know, but I don't know. I'm sorry if my >> letter was a bit unclear, English is not my native language. > But you did claim `must be used'. That really is rarely the case, and > the skill in programming R (or S) is use memory within the resources > available. Yes I did write 'must be used', yet I did not want to claim anything. Really, I did not want to express so strictly. I should of used 'is used', or something else. As I said English is not my native language:) I tried to run the same calculations with initial matrix size of 7500 rows and 70 columns, on Linux machine as you suggested. Debian Woody, R version 1.5.1 with 256 MB RAM and 256 MB Swap, everything worked fine, not like on Windows 2000 Professional machine with 256 RAM. It seems that Win2k is really missing something with memory management, or R windows build is somewhat different from linux build? Thanks for everybody's answers Vaidotas Zemlys -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reasonably Related Threads
- features of save and save.image (unexpected file sizes)
- Saving a plot in R-LINUX
- shared-mime-info (PR#8278)
- r-cran-rjava dependencies on debian jesse, library(rJava) fails when default-jre is missing
- r-cran-rjava dependencies on debian jesse, library(rJava) fails when default-jre is missing