Hi all, I am using R1.5.0 under Unix, I have a couple of questions here. 1. My program is running out of memory. I am writing a program to grow a list of trees using rpart() on a subset of a large dataset(5807x693) with a different response for every tree. I saw that after each tree was constucted, 116 MB of data was being added to the Vcells. I have no idea what this data is. My dataset is 30MB large and each tree is 1.6 MB large. Could someone tell me how to monitor what data is getting stored in the Vcells? 2. This is related to the same program as above. When growing a tree I used the expression: fit <- rpart(formula= x[[34]] ~ ., data = x) This does not give an error but does give an obviously wrong answer. But when I rearranged the data.frame, x, so that the response variable comes in the first column and all the other variables in the remaining columns and tried using fit <- rpart(x) it worked perfectly i.e gave the correct tree. Could someone tell me what to do if I want the 34th column of the data.frame to be the response variable but dont want to use the column names in the formula for growing the tree. Thanks in advance. -Saket. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi all, My sincere apologies to all those who could not understand my previous question and so could not answer it. I am not a statistitian and neither have I worked on R for long. So please excuse my naive language. I hope I can explain my question better this time. I have a data.frame named 'temp'. The following are the series of commands that followed after I obtained this data.frame> x <- rpart(temp)> attributes(x)$names [1] "frame" "where" "call" "terms" "cptable" "splits" [7] "method" "parms" "control" "functions" "y" "ordered" $class [1] "rpart"> x$functions$summary function (yval, dev, wt, ylevel, digits) { paste(" mean=", formatg(yval, digits), ", MSE=", formatg(dev/wt, digits), sep = "") } <environment: 4494214> $text function (yval, dev, wt, ylevel, digits, n, use.n) { if (use.n) { paste(formatg(yval, digits), "\nn=", n, sep = "") } else { paste(formatg(yval, digits)) } } <environment: 4494214>> gc()used (Mb) gc trigger (Mb) Ncells 330122 8.9 1162530 31.1 Vcells 46072722 351.6 64233246 490.1> x$functions <- NULL> gc()used (Mb) gc trigger (Mb) Ncells 326469 8.8 1162530 31.1 Vcells 34321042 261.9 64233246 490.1 When the "functions" attribute of x was set to NULL, the storage in the Vcells reduced from 351.6 Mb to 261.9 Mb as can be seen from the 2 gc() commands executed above. I imagined that the rpart object 'x', is storing a pointer by the name of 'functions' to a large amount of data in the Vcells. This data was garbage collected when the pointer 'functions' was NULLed. However I am not sure that I am right on this count. My question is: Is there a way in which the options to rpart or otherwise can be set so as to never create the pointer 'functions' while fitting the rpart model in the first place instead of having to delete it later in order to save memory? Thanks in advance, Saket. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi all, I am using R1.5.0 under unix. I have 2 data.frames x and yx. both have the same number of rows. I wrote a function abc to do the following: abc <- function(x, zyx) { x[1] <- zyx # zyx is a column of yx rpart(x) } Then I wrote another function fgh: fgh <- function() { yx <- ............... # yx created x <- ................ # x created lapply(yx, abc, x) } when I ran function fgh at the prompt, the program started running out of memory and the process running R gets killed. The reason for the memory running out is that with every call to the abc function in the implicit loop lapply, the memory equivalent to that of x is allocated in the variable storage (Vcells). The trouble is that the memory allocated is not freed when the function abc is run by lapply for the next element of yx i.e. in the next loop. Since x takes up 25 MB of space and since yx has 126 columns, a lot of memory space keeps getting allocated without being freed even when it is no longer required. I wonder if there are any options that I should set so that the memory allocated within a function is freed when the function is exited. But I am not sure how to do that or else how to free the memory by any other way. Could someone tell me how I can avoid this memory explosion? Thanks in advance, -Saket. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I guess this is becasuse you are returning a whole rpart fit, and so really are using the memory. Try just returning what you need from the rpart object. On Sun, 1 Sep 2002, Saket Joshi wrote:> Hi all, > > I am using R1.5.0 under unix. > > I have 2 data.frames x and yx. both have the same number of rows. I wrote > a function abc to do the following: > > abc <- function(x, zyx) > { > x[1] <- zyx # zyx is a column of yx > rpart(x) > } > > Then I wrote another function fgh: > > fgh <- function() > { > yx <- ............... # yx created > x <- ................ # x created > lapply(yx, abc, x) > } > > when I ran function fgh at the prompt, the program started running out of > memory and the process running R gets killed. The reason for > the memory running out is that with every call to the abc function in the > implicit loop lapply, the memory equivalent to that of x is allocated in the > variable storage (Vcells). The trouble is that the memory allocated is not > freed when the function abc is run by lapply for the next element of yx > i.e. in the next loop. > > Since x takes up 25 MB of space and since yx has 126 columns, a lot of > memory space keeps getting allocated without being freed even when it is no > longer required. > > I wonder if there are any options that I should set so that the memory > allocated within a function is freed when the function is exited. But I am > not sure how to do that or else how to free the memory by any other way. > > Could someone tell me how I can avoid this memory explosion?I think your diagnosis is an incorrect guess. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._