Anthony Dick
2011-Feb-24 19:23 UTC
[R] parallel bootstrap linear model on multicore mac (re-post)
Hello all, I am re-posting my previous question with a simpler, more transparent, commented code. I have been ramming my head against this problem, and I wondered if anyone could lend a hand. I want to make parallel a bootstrap of a linear mixed model on my 8-core mac. Below is the process that I want to make parallel (namely, the boot.out<-boot(dat.res,boot.fun, R = nboot) command). This is an extension to lmer of the bootstrapping linear models example in Venables and Ripley. Please excuse my rather terrible programming skills. I am always open to suggestions. Below the example I describe what methods I have tried. library(boot) library(lme4) dat<-read.table("http://www2.fiu.edu/~adick/downloads/toy2.dat", header = T) nboot<-1000 # number of bootstraps attach(dat) x<-dat[,2] # IV number 1 y<-dat[,4] # DV z<-dat[,3] # IV number 2 subj<-dat[,1] # random factor boot.fun<-function(data,i) { # function to resample residuals d<-data d$y<- d$fitted+d$res[i] # populate new y values based on resampled residuals as.numeric(coef(update(m2.fit,data=d))[1][[1]][1,c(1:4)]) # update the linear model and output the coefficients } fit<-lmer(y~x*z + (1|(subj))) # the linear model dat.res<-data.frame(y,x,z,subj, res=resid(fit), fitted=fitted(fit)) # add residuals and fitted values to dat boot.out<-boot(dat.res,boot.fun, R = nboot) # run the bootstrap using the boot.fun boot.out Methods attempted: Using the multicore package, I tried boot.out<-collect(parallel(boot(dat.res,boot.fun, R = nboot))). This returned a correct result, but did not speed things up. Not sure why... I also tried snowfall and snow. While I can create a cluster and run simple processes (e.g., provided example from literature), I can't get the bootstrap to run. For example, using snow: cl <- makeCluster(8) clusterSetupRNG(cl) clusterEvalQ(cl,library(boot)) clusterEvalQ(cl,library(lme4)) boot.out<-clusterCall(cl,boot(dat.res,boot.fun, R = nboot)) stopCluster() returns the following error: Error in checkForRemoteErrors(lapply(cl, recvResult)) : 8 nodes produced errors; first error: could not find function "fun" I am stuck and at the limit of my programming knowledge and am punting to the R-help list. I need to run this process thousands of times, which is the reason to make it parallel. Any suggestions are much appreciated. Anthony -- Anthony Steven Dick, Ph.D. Assistant Professor Department of Psychology Florida International University Modesto A. Maidique Campus DM 296B 11200 S.W. 8th Street Miami, FL 33199 Phone: 305-348-4202 Lab Phone: 305-348-9057 or 305-348-9055 (I am usually here) Fax: 305-348-3879 Email: adick at fiu.edu Webpage: http://www.fiu.edu/~adick
Anthony Dick
2011-Mar-02 22:38 UTC
[R] parallel bootstrap linear model on multicore mac (re-post)
Hello all, I am re-posting my previous question with a simpler, more transparent, commented code. I have been ramming my head against this problem, and I wondered if anyone could lend a hand. I want to make parallel a bootstrap of a linear mixed model on my 8-core mac. Below is the process that I want to make parallel (namely, the boot.out<-boot(dat.res,boot.fun, R = nboot) command). This is an extension to lmer of the bootstrapping linear models example in Venables and Ripley. Please excuse my rather terrible programming skills. I am always open to suggestions. Below the example I describe what methods I have tried. library(boot) library(lme4) dat<-read.table("http://www2.fiu.edu/~adick/downloads/toy2.dat <http://www2.fiu.edu/%7Eadick/downloads/toy2.dat>", header = T) nboot<-1000 # number of bootstraps attach(dat) x<-dat[,2] # IV number 1 y<-dat[,4] # DV z<-dat[,3] # IV number 2 subj<-dat[,1] # random factor boot.fun<-function(data,i) { # function to resample residuals d<-data d$y<- d$fitted+d$res[i] # populate new y values based on resampled residuals as.numeric(coef(update(m2.fit,data=d))[1][[1]][1,c(1:4)]) # update the linear model and output the coefficients } fit<-lmer(y~x*z + (1|(subj))) # the linear model dat.res<-data.frame(y,x,z,subj, res=resid(fit), fitted=fitted(fit)) # add residuals and fitted values to dat boot.out<-boot(dat.res,boot.fun, R = nboot) # run the bootstrap using the boot.fun boot.out Methods attempted: Using the multicore package, I tried boot.out<-collect(parallel(boot(dat.res,boot.fun, R = nboot))). This returned a correct result, but did not speed things up. Not sure why... I also tried snowfall and snow. While I can create a cluster and run simple processes (e.g., provided example from literature), I can't get the bootstrap to run. For example, using snow: cl<- makeCluster(8) clusterSetupRNG(cl) clusterEvalQ(cl,library(boot)) clusterEvalQ(cl,library(lme4)) boot.out<-clusterCall(cl,boot(dat.res,boot.fun, R = nboot)) stopCluster() returns the following error: Error in checkForRemoteErrors(lapply(cl, recvResult)) : 8 nodes produced errors; first error: could not find function "fun" I am stuck and at the limit of my programming knowledge and am punting to the R-help list. I need to run this process thousands of times, which is the reason to make it parallel. Any suggestions are much appreciated. Anthony -- Anthony Steven Dick, Ph.D. Assistant Professor Department of Psychology Florida International University Modesto A. Maidique Campus DM 296B 11200 S.W. 8th Street Miami, FL 33199 Phone: 305-348-4202 Lab Phone: 305-348-9057 or 305-348-9055 (I am usually here) Fax: 305-348-3879 Email: adick@fiu.edu Webpage: http://www.fiu.edu/~adick [[alternative HTML version deleted]]