Dear List: I am running into a memory issue that I haven't noticed before. I am running a simulation with all of the code used below. I have increased my memory to 712mb and have a total of 1 gb on my machine. What appears to be happening is I run a simulation where I create 1,000 datasets with a sample size of 100. I then run each dataset through a gls and obtain some estimates. This works fine. But, when I view how much memory is being used in Windows, I see that it does not reduce once the analysis is complete. As a result, I must quit R and then perform another analysis. So for example, before starting the 1st simulation, my windows task manager tells me I am using 200mb of memory. After running the first simulation it may go up to 500mb. I then try and run another simulation with a larger sample size, but I quickly run out of memory because it starts at 500 and increases from there and the simulation halts. So, it appears that R does not release memory after intense analyses, but is accumulated. Is this correct? If so, could this be due to inefficient code? Or, is this an issue specific to Windows? I didn't see this in the FAQ section on memory or in my searches on the web. I'm not sure how I can work more efficiently here. Thanks Harold R 2.0 Windows XP #Housekeeping library(MASS) library(nlme) mu<-c(100,150,200,250) Sigma<-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4,4 ) mu2<-c(0,0,0) Sigma2<-diag(16,3) sample.size<-100 N<-1000 #Number of datasets #Take a draw from VL distribution vl.error<-mvrnorm(n=N, mu2, Sigma2) #Step 1 Create Data Data <- lapply(seq(N), function(x) as.data.frame(cbind(1:10,mvrnorm(n=sample.size, mu, Sigma)))) #Step 2 Add Vertical Linking Error for(i in seq(along=Data)){ Data[[i]]$V6 <- Data[[i]]$V2 Data[[i]]$V7 <- Data[[i]]$V3 + vl.error[i,1] Data[[i]]$V8 <- Data[[i]]$V4 + vl.error[i,2] Data[[i]]$V9 <- Data[[i]]$V5 + vl.error[i,3] } #Step 3 Restructure for Longitudinal Analysis long <- lapply(Data, function(x) reshape(x, idvar="Data[[i]]$V1", varying=list(c(names(Data[[i]])[2:5]),c(names(Data[[i]])[6:9])), v.names=c("score.1","score.2"), direction="long")) # Step 4 Run GLS glsrun1 <- lapply(long, function(x) gls(score.1~I(time-1), data=x, correlation=corAR1(form=~1|V1), method='ML')) glsrun2 <- lapply(long, function(x) gls(score.2~I(time-1), data=x, correlation=corAR1(form=~1|V1), method='ML')) # Step 5 Extract Intercepts and slopes int1 <- lapply(glsrun1, function(x) x$coefficient[1]) slo1 <- lapply(glsrun1, function(x) x$coefficient[2]) int2 <- lapply(glsrun2, function(x) x$coefficient[1]) slo2 <- lapply(glsrun2, function(x) x$coefficient[2]) # Step 6 Compute SD of intercepts and slopes int.sd1 <- sapply(glsrun1, function(x) x$coefficient[1]) slo.sd1 <- sapply(glsrun1, function(x) x$coefficient[2]) int.sd2 <- sapply(glsrun2, function(x) x$coefficient[1]) slo.sd2 <- sapply(glsrun2, function(x) x$coefficient[2]) cat("Original Standard Errors","\n", "Intercept","\t", sd(int.sd1),"\n","Slope","\t","\t", sd(slo.sd1),"\n") cat("Modified Standard Errors","\n", "Intercept","\t", sd(int.sd2),"\n","Slope","\t","\t", sd(slo.sd2),"\n") [[alternative HTML version deleted]]
On Sat, 8 Jan 2005 16:38:31 -0500, "Doran, Harold" <HDoran at air.org> wrote:>Dear List: > >I am running into a memory issue that I haven't noticed before. I am >running a simulation with all of the code used below. I have increased >my memory to 712mb and have a total of 1 gb on my machine. > >What appears to be happening is I run a simulation where I create 1,000 >datasets with a sample size of 100. I then run each dataset through a >gls and obtain some estimates. > >This works fine. But, when I view how much memory is being used in >Windows, I see that it does not reduce once the analysis is complete. As >a result, I must quit R and then perform another analysis.If you ask Windows how much memory is being used, you'll likely get an incorrect answer. R may not release memory back to the OS, but it may be available for re-use within R. Call gc() to see how much memory R thinks is in use.>So for example, before starting the 1st simulation, my windows task >manager tells me I am using 200mb of memory. After running the first >simulation it may go up to 500mb. I then try and run another simulation >with a larger sample size, but I quickly run out of memory because it >starts at 500 and increases from there and the simulation halts.The difficulty you're running into may be memory fragmentation. When you run with a larger sample size, R will try to allocate larger chunks than it did originally. If the "holes" created when the original simulation is deleted are too small, R will need to ask Windows for new memory to store things in. You could try deleting everything in your workspace before running the 2nd simulation; this should reduce the fragmentation. Or you could run the big simulation first, then the smaller one will fit in the holes left from it. Duncan Murdoch
One hint: R rarely releases memory to the OS, especially under Windows. So do not expect to see the usage reported by Windows going down. One possibility is that you are storing lots of results and not removing them. You don't need to store all the gls fits, just the parts you need. You can use gc(), memory.profile() and object.size() to see where memory is being used. On Sat, 8 Jan 2005, Doran, Harold wrote:> Dear List: > > I am running into a memory issue that I haven't noticed before. I am > running a simulation with all of the code used below. I have increased > my memory to 712mb and have a total of 1 gb on my machine. > > What appears to be happening is I run a simulation where I create 1,000 > datasets with a sample size of 100. I then run each dataset through a > gls and obtain some estimates. > > This works fine. But, when I view how much memory is being used in > Windows, I see that it does not reduce once the analysis is complete. As > a result, I must quit R and then perform another analysis. > > So for example, before starting the 1st simulation, my windows task > manager tells me I am using 200mb of memory. After running the first > simulation it may go up to 500mb. I then try and run another simulation > with a larger sample size, but I quickly run out of memory because it > starts at 500 and increases from there and the simulation halts. > > So, it appears that R does not release memory after intense analyses, > but is accumulated. Is this correct? If so, could this be due to > inefficient code? Or, is this an issue specific to Windows? I didn't see > this in the FAQ section on memory or in my searches on the web. I'm not > sure how I can work more efficiently here. > > Thanks > Harold > R 2.0 > Windows XP > > > #Housekeeping > library(MASS) > library(nlme) > mu<-c(100,150,200,250) > Sigma<-matrix(c(400,80,80,80,80,400,80,80,80,80,400,80,80,80,80,400),4,4 > ) > mu2<-c(0,0,0) > Sigma2<-diag(16,3) > sample.size<-100 > N<-1000 #Number of datasets > #Take a draw from VL distribution > vl.error<-mvrnorm(n=N, mu2, Sigma2) > > #Step 1 Create Data > Data <- lapply(seq(N), function(x) > as.data.frame(cbind(1:10,mvrnorm(n=sample.size, mu, Sigma)))) > > #Step 2 Add Vertical Linking Error > for(i in seq(along=Data)){ > Data[[i]]$V6 <- Data[[i]]$V2 > Data[[i]]$V7 <- Data[[i]]$V3 + vl.error[i,1] > Data[[i]]$V8 <- Data[[i]]$V4 + vl.error[i,2] > Data[[i]]$V9 <- Data[[i]]$V5 + vl.error[i,3] > } > > #Step 3 Restructure for Longitudinal Analysis > long <- lapply(Data, function(x) reshape(x, idvar="Data[[i]]$V1", > varying=list(c(names(Data[[i]])[2:5]),c(names(Data[[i]])[6:9])), > v.names=c("score.1","score.2"), direction="long")) > > # Step 4 Run GLS > > glsrun1 <- lapply(long, function(x) gls(score.1~I(time-1), data=x, > correlation=corAR1(form=~1|V1), method='ML')) > > glsrun2 <- lapply(long, function(x) gls(score.2~I(time-1), data=x, > correlation=corAR1(form=~1|V1), method='ML')) > > # Step 5 Extract Intercepts and slopes > int1 <- lapply(glsrun1, function(x) x$coefficient[1]) > slo1 <- lapply(glsrun1, function(x) x$coefficient[2]) > int2 <- lapply(glsrun2, function(x) x$coefficient[1]) > slo2 <- lapply(glsrun2, function(x) x$coefficient[2]) > > # Step 6 Compute SD of intercepts and slopes > > int.sd1 <- sapply(glsrun1, function(x) x$coefficient[1]) > slo.sd1 <- sapply(glsrun1, function(x) x$coefficient[2]) > int.sd2 <- sapply(glsrun2, function(x) x$coefficient[1]) > slo.sd2 <- sapply(glsrun2, function(x) x$coefficient[2]) > > cat("Original Standard Errors","\n", "Intercept","\t", > sd(int.sd1),"\n","Slope","\t","\t", sd(slo.sd1),"\n") > > cat("Modified Standard Errors","\n", "Intercept","\t", > sd(int.sd2),"\n","Slope","\t","\t", sd(slo.sd2),"\n") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595