Bruce Foster
2005-May-19 22:20 UTC
[R] R 2.1.0 RH Linux Built from Source Segmentation Fault
Background: I administer a cluster of RedHat EWS 3U4 Linux workstations at a university. I built R 2.1.0 from source: ./configure \ --prefix=/sscc/opt/R-2.1.0 \ --with-blas=no \ 2>&1 \ | tee NUInstall.configure R is now configured for i686-pc-linux-gnu Source directory: . Installation directory: /sscc/opt/R-2.1.0 C compiler: gcc -g -O2 C++ compiler: g++ -g -O2 Fortran compiler: g77 -g -O2 Interfaces supported: X11, tcltk External libraries: readline Additional capabilities: PNG, JPEG, iconv, MBCS, NLS Options enabled: R profiling Recommended packages: yes configure: WARNING: you cannot build info or html versions of the R manuals The machines are AMD Athlon MP 2400+ with 2 GB RAM, dual CPUs, and lots of free disk space. I've got a user running Monte Carlo codes that fail with segmentation faults on a frequent basis. The jobs run for a long time (up to a day) before failure. If a failed job is rerun, chances are high that it will run to completion. I'm at a loss about approaching this problem. R (as it is here) doesn't seem to give much of a hint as to where things are when it crashes. I'm looking for some guidance to diagnose this problem so we can focus on a solution. Thanks! Here's the annotated output of a failed job. The source file bayes_book_r_functions.R comes from Peter Rossi: http://gsbwww.uchicago.edu/fac/peter.rossi/research/bayes%20book/R%20functions/ The second source file is included below. R : Copyright 2005, The R Foundation for Statistical Computing Version 2.1.0 (2005-04-18), ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for a HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored]> # program name: mnl.R > # my hierarchical bayes logit model using random walk algorithm > # > # nphy= number of physicians in the sample > # nvar = no. of variables in X > # nalt = no. of alternatives > # nobs = no. of observations > # yrows (nphy x 1 matrix) contains no. of observations by each physician > # X (nobs*nalt x nvar) contains xvalues for each of j alt on each >of n occasions > # Y (nobs x 1) contains chosen alternative > # > > source("bayes_book_r_functions.R") > nalt=5 > > X=read.table("x300_new.txt",header=TRUE) > X=as.matrix(X) > X=cbind(X[,1:4],X[,10:11]) > nvar=ncol(X) > Y=read.table("y300_new.txt",header=TRUE) > Y=as.matrix(Y) > yrows=read.table("yrows300_new.txt",header=TRUE) > nphy=nrow(yrows) > nobs=sum(yrows)/nalt > if (nrow(X)!=nobs*nalt){print("data dimensions wrong")} > > betastore=NULL > delta=diag(nvar) > z=read.table("z300.txt",header=TRUE) > z=as.matrix(z) > k=ncol(z) > A1=.1*diag(k) > nu1=3+nalt > V=diag(nvar)*nu1 > > rowx1=rep(0,nphy+1) > rowx2=rep(0,nphy) > rowy1=rep(0,nphy+1) > rowy2=rep(0,nphy) > rowx1[1]=1 > rowy1[1]=1 > for (i in 1:nphy){+ # for each physician i draw the relevant X and Y obs using yrows + # + rowx2[i]=rowx1[i]-1+yrows[i,1] + rowy2[i]=rowy1[i]-1+yrows[i,1]/nalt + rowx1[i+1]=rowx2[i]+1 + rowy1[i+1]=rowy2[i]+1 + }> > R=100000 > keep=10 > thetas=matrix(rep(0,(R/keep+1)*nvar*k),ncol=nvar*k) > theta=matrix(rep(0,nvar*k),nrow=k) > thetabar=theta > source("rmnlRwMetrop1.R") > scale=.5 > beta=matrix(rep(0,nphy*nvar),byrow=TRUE,ncol=nvar) > accept=rep(0,nphy) > accepts=rep(0,R/keep) > parameters=list(R=1,keep=1,s=scale) > for (j in 1:R)+ { + itime=proc.time()[3] + for (i in 1:nphy){ + # for each physician i draw a beta[i] using MH algorithm + # + Data=list(m=nalt,X=X[rowx1[i]:rowx2[i],],y=Y[rowy1[i]:rowy2[i],]) + bbar=t(theta)%*%z[i,] + prior=list(A=delta,betabar=bbar) + a=rmnlRwMetrop1(Data,prior,Mcmc=parameters,beta[i,]) + beta[i,]=a$betadraw + accept[i]=a$acceptr + } + mregout=rmultireg(beta,z,thetabar,A1,nu1,V) + delta=mregout$Sigma + theta=mregout$B + if (j%%keep==0){ + thetas[j/keep,]=as.vector(theta) + betastore[j/keep]=list(beta) + accepts[j/keep]=mean(accept) + ftime=proc.time()[3] + cat('Time taken by iteration: ',j, ' = ',round((ftime-itime)/60,2),'\n') + } + if (j%%10000 == 0) + {cat('Mean theta at iteration: ',j,' = ',apply(thetas[1:j/keep,],2,mean),'\n') + cat('Mean sd of theta at iteration: ',j,' = ',apply(thetas[1:j/keep,],2,sd),'\n') + } + } Time taken by iteration: 10 = 0.01 Time taken by iteration: 20 = 0.01 Time taken by iteration: 30 = 0.01 Time taken by iteration: 40 = 0.01 Time taken by iteration: 50 = 0.01 ... many lines deleted Time taken by iteration: 92800 = 0.01 Time taken by iteration: 92810 = 0.01 Time taken by iteration: 92820 = 0.01 Time taken by iteration: 92830 = 0.01 Time taken by iteration: 92840 = 0.01 That's it! The PBS output is: ==============Original message text============== From: "seldon.it.northwestern.edu" <root at seldon.it.northwestern.edu> Date: Wed, 18 May 2005 4:47:19 pm CDT Subject: PBS JOB 1534.seldon PBS Job Id: 1534.seldon Job Name: mnl300_z Execution terminated Exit_status=139 resources_used.cpupercent=98 resources_used.cput=00:25:02 resources_used.mem=233536kb resources_used.ncpus=1 resources_used.vmem=250252kb resources_used.walltime=00:25:13 ===========End of original message text========== The second source file contains: rmnlRwMetrop1=function(Data,Prior,Mcmc,beta0) { # # purpose: # draw from posterior for MNL using Independence Metropolis # # Arguments: # Data - list of m,X,y # m is number of alternatives # X is nobs*m x nvar matrix # y is nobs vector of values from 1 to m # Prior - list of A, betabar # A is nvar x nvar prior preci matrix # betabar is nvar x 1 prior mean # Mcmc # R is number of draws # keep is thinning parameter # s is scaling parameter for random walk # beta0 is initial beta # # Output: # list of betadraws # # Model: Pr(y=j) = exp(x_j'beta)/sum(exp(x_k'beta) # # Prior: beta ~ N(betabar,A^-1) # # check arguments # X=Data$X y=Data$y m=Data$m nvar=ncol(X) nobs=length(y) # check for Prior # if(missing(Prior)) { betabar=c(rep(0,nvar)); A=diag(rep(.01,nvar))} else { if(is.null(Prior$betabar)) {betabar=c(rep(0,nvar))} else {betabar=Prior$betabar} if(is.null(Prior$A)) {A=matrix(rep(.01,nvar*nvar),ncol=nvar)} else {A=Prior$A} } R=Mcmc$R keep=Mcmc$keep s=Mcmc$s # Check beta0 argument # if (missing(beta0)) {beta0=c(rep(0,nvar))} # betadraw=matrix(double(floor(R/keep)*nvar),ncol=nvar) # # compute required quantities for indep candidates # oldbeta=beta0 mhess=diag(nvar) candcov=chol2inv(chol(mhess)) root=s*chol(candcov) priorcov=chol2inv(chol(A)) rootp=chol(priorcov) rootpi=backsolve(rootp,diag(nvar)) # # start main iteration loop # oldlpost=llmnl(y,X,beta0)+lmvn(beta0,betabar,rootpi) naccept=0 for (rep in 1:R) { betac=oldbeta+t(root)%*%rnorm(nvar) clpost=llmnl(y,X,betac)+lmvn(betac,betabar,rootpi) ldiff=clpost-oldlpost alpha=min(1,exp(ldiff)) if(alpha < 1) {unif=runif(1)} else {unif=0} if (unif <= alpha) { beta0=betac oldlpost=clpost naccept=naccept+1} oldbeta=beta0 if(rep%%keep == 0) {mkeep=rep/keep; betadraw[mkeep,]=beta0} } list(betadraw=betadraw,acceptr=naccept/R) }
Peter Dalgaard
2005-May-20 07:04 UTC
[R] R 2.1.0 RH Linux Built from Source Segmentation Fault
Bruce Foster <bef at northwestern.edu> writes: ...> The machines are AMD Athlon MP 2400+ with 2 GB RAM, dual CPUs, and > lots of free disk space.Any per-user/per-process limits? Resource usage look suspiciously close to 256M. If your install is allowing overcommitment of memory, the OS can kill processes at unpredictable times.> I've got a user running Monte Carlo codes that fail with segmentation > faults on a frequent basis. The jobs run for a long time (up to a day) > before failure. > > If a failed job is rerun, chances are high that it will run to completion. > > I'm at a loss about approaching this problem. R (as it is here) > doesn't seem to give much of a hint as to where things are when it > crashes. > > I'm looking for some guidance to diagnose this problem so we can focus > on a solution.(A) Use set.seed(...) to get a fixed sequence of random numbers. If it still fails unpredictably, my bet is that it is a resource problem. (B) Once you have a case that fails predictably, run it under a debugger and try to backtrack to the point of failure. There are various debugging tricks that you can use, but just get there first and show us a stack backtrace at the failure point (bt command in gdb). For more detailed guidance you should probably move the discussion to the r-devel list. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907