thr3ads.net - R help - [R] R 2.1.0 RH Linux Built from Source Segmentation Fault [May 2005]

If this information is useful, please help other people find it:
Share via:

Bruce Foster

2005-May-19 22:20 UTC

[R] R 2.1.0 RH Linux Built from Source Segmentation Fault

Background:

I administer a cluster of RedHat EWS 3U4 Linux workstations at a university.

I built R 2.1.0 from source:

./configure \
         --prefix=/sscc/opt/R-2.1.0 \
         --with-blas=no \
         2>&1 \
         | tee NUInstall.configure


R is now configured for i686-pc-linux-gnu

   Source directory:          .
   Installation directory:    /sscc/opt/R-2.1.0

   C compiler:                gcc  -g -O2
   C++ compiler:              g++  -g -O2
   Fortran compiler:          g77  -g -O2

   Interfaces supported:      X11, tcltk
   External libraries:        readline
   Additional capabilities:   PNG, JPEG, iconv, MBCS, NLS
   Options enabled:           R profiling

   Recommended packages:      yes

configure: WARNING: you cannot build info or html versions of the R manuals

The machines are AMD Athlon MP 2400+ with 2 GB RAM, dual CPUs, and 
lots of free disk space.


I've got a user running Monte Carlo codes that fail with segmentation 
faults on a frequent basis. The jobs run for a long time (up to a 
day) before failure.

If a failed job is rerun, chances are high that it will run to completion.

I'm at a loss about approaching this problem. R (as it is here) 
doesn't seem to give much of a hint as to where things are when it 
crashes.

I'm looking for some guidance to diagnose this problem so we can 
focus on a solution.

Thanks!



Here's the annotated output of a failed job. The source file 
bayes_book_r_functions.R comes from Peter Rossi:

gsbwww.uchicago.edu/fac/peter.rossi/research/bayes book/R functions

The second source file is included below.


R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.1.0  (2005-04-18), ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

   Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for a HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]
>  # program name: mnl.R
>  # my hierarchical bayes logit model using random walk algorithm
>  #
>  # nphy= number of physicians in the sample
>  # nvar = no. of variables in X
>  # nalt = no. of alternatives
>  # nobs = no. of observations
>  # yrows (nphy x 1 matrix) contains no. of observations by each physician
>  # X (nobs*nalt x nvar) contains xvalues for each of j alt on each 
>of n occasions
>  # Y (nobs x 1) contains chosen alternative
>  #
>
>  source("bayes_book_r_functions.R")
>  nalt=5
>
>  X=read.table("x300_new.txt",header=TRUE)
>  X=as.matrix(X)
>  X=cbind(X[,1:4],X[,10:11])
>  nvar=ncol(X)
>  Y=read.table("y300_new.txt",header=TRUE)
>  Y=as.matrix(Y)
>  yrows=read.table("yrows300_new.txt",header=TRUE)
>  nphy=nrow(yrows)
>  nobs=sum(yrows)/nalt
>  if (nrow(X)!=nobs*nalt){print("data dimensions wrong")}
>
>  betastore=NULL
>  delta=diag(nvar)
>  z=read.table("z300.txt",header=TRUE)
>  z=as.matrix(z)
>  k=ncol(z)
>  A1=.1*diag(k)
>  nu1=3+nalt
>  V=diag(nvar)*nu1
>
>  rowx1=rep(0,nphy+1)
>  rowx2=rep(0,nphy)
>  rowy1=rep(0,nphy+1)
>  rowy2=rep(0,nphy)
>  rowx1[1]=1
>  rowy1[1]=1
>  for (i in 1:nphy){+ # for each physician i draw the relevant X and Y obs using yrows
+ #
+ rowx2[i]=rowx1[i]-1+yrows[i,1]
+ rowy2[i]=rowy1[i]-1+yrows[i,1]/nalt
+ rowx1[i+1]=rowx2[i]+1
+ rowy1[i+1]=rowy2[i]+1
+ }>
>  R=100000
>  keep=10
>  thetas=matrix(rep(0,(R/keep+1)*nvar*k),ncol=nvar*k)
>  theta=matrix(rep(0,nvar*k),nrow=k)
>  thetabar=theta
>  source("rmnlRwMetrop1.R")
>  scale=.5
>  beta=matrix(rep(0,nphy*nvar),byrow=TRUE,ncol=nvar)
>  accept=rep(0,nphy)
>  accepts=rep(0,R/keep)
>  parameters=list(R=1,keep=1,s=scale)
>  for (j in 1:R)+ {
+ itime=proc.time()[3]
+ for (i in 1:nphy){
+       # for each physician i draw a beta[i] using MH algorithm
+       #
+       Data=list(m=nalt,X=X[rowx1[i]:rowx2[i],],y=Y[rowy1[i]:rowy2[i],])
+       bbar=t(theta)%*%z[i,]
+       prior=list(A=delta,betabar=bbar)
+       a=rmnlRwMetrop1(Data,prior,Mcmc=parameters,beta[i,])
+       beta[i,]=a$betadraw
+       accept[i]=a$acceptr
+       }
+ mregout=rmultireg(beta,z,thetabar,A1,nu1,V)
+ delta=mregout$Sigma
+ theta=mregout$B
+ if (j%%keep==0){
+ thetas[j/keep,]=as.vector(theta)
+ betastore[j/keep]=list(beta)
+ accepts[j/keep]=mean(accept)
+ ftime=proc.time()[3]
+ cat('Time taken by iteration: ',j, ' =
',round((ftime-itime)/60,2),'\n')
+ }
+ if (j%%10000 == 0)
+ {cat('Mean theta at iteration: ',j,' = 
',apply(thetas[1:j/keep,],2,mean),'\n')
+ cat('Mean sd of theta at iteration: ',j,' = 
',apply(thetas[1:j/keep,],2,sd),'\n')
+ }
+ }
Time taken by iteration:  10  =  0.01
Time taken by iteration:  20  =  0.01
Time taken by iteration:  30  =  0.01
Time taken by iteration:  40  =  0.01
Time taken by iteration:  50  =  0.01

   ... many lines deleted

Time taken by iteration:  92800  =  0.01
Time taken by iteration:  92810  =  0.01
Time taken by iteration:  92820  =  0.01
Time taken by iteration:  92830  =  0.01
Time taken by iteration:  92840  =  0.01


That's it!

The PBS output is:

==============Original message text==============    From:
"seldon.it.northwestern.edu" <root at
seldon.it.northwestern.edu>
    Date: Wed, 18 May 2005 4:47:19 pm CDT
    Subject: PBS JOB 1534.seldon

PBS Job Id: 1534.seldon
Job Name:   mnl300_z
Execution terminated
Exit_status=139
resources_used.cpupercent=98
resources_used.cput=00:25:02
resources_used.mem=233536kb
resources_used.ncpus=1
resources_used.vmem=250252kb
resources_used.walltime=00:25:13

===========End of original message text==========
The second source file contains:

rmnlRwMetrop1=function(Data,Prior,Mcmc,beta0)
{
#
# purpose:
#   draw from posterior for MNL using Independence Metropolis
#
# Arguments:
#   Data - list of m,X,y
#     m is number of alternatives
#     X is nobs*m x nvar matrix
#     y is nobs vector of values from 1 to m
#   Prior - list of A, betabar
#     A is nvar x nvar prior preci matrix
#     betabar is nvar x 1 prior mean
#   Mcmc
#     R is number of draws
#     keep is thinning parameter
#     s is scaling parameter for random walk
#   beta0 is initial beta
#
# Output:
#   list of betadraws
#
# Model:   Pr(y=j) = exp(x_j'beta)/sum(exp(x_k'beta)
#
# Prior:   beta ~ N(betabar,A^-1)
#
# check arguments
#
X=Data$X
y=Data$y
m=Data$m
nvar=ncol(X)
nobs=length(y)
# check for Prior
#
if(missing(Prior))
    { betabar=c(rep(0,nvar)); A=diag(rep(.01,nvar))}
else
    {
     if(is.null(Prior$betabar)) {betabar=c(rep(0,nvar))}
        else {betabar=Prior$betabar}
     if(is.null(Prior$A)) {A=matrix(rep(.01,nvar*nvar),ncol=nvar)}
        else {A=Prior$A}
    }
R=Mcmc$R
keep=Mcmc$keep
s=Mcmc$s
# Check beta0 argument
#
if (missing(beta0)) {beta0=c(rep(0,nvar))}
#
betadraw=matrix(double(floor(R/keep)*nvar),ncol=nvar)
#
# compute required quantities for indep candidates
#
oldbeta=beta0
mhess=diag(nvar)
candcov=chol2inv(chol(mhess))
root=s*chol(candcov)
priorcov=chol2inv(chol(A))
rootp=chol(priorcov)
rootpi=backsolve(rootp,diag(nvar))

#
#       start main iteration loop
#
oldlpost=llmnl(y,X,beta0)+lmvn(beta0,betabar,rootpi)
naccept=0

for (rep in 1:R)
{
    betac=oldbeta+t(root)%*%rnorm(nvar)
    clpost=llmnl(y,X,betac)+lmvn(betac,betabar,rootpi)
    ldiff=clpost-oldlpost
    alpha=min(1,exp(ldiff))
    if(alpha < 1) {unif=runif(1)} else {unif=0}
    if (unif <= alpha)
       { beta0=betac
         oldlpost=clpost
         naccept=naccept+1}
    oldbeta=beta0

   if(rep%%keep == 0)
     {mkeep=rep/keep; betadraw[mkeep,]=beta0}
}
list(betadraw=betadraw,acceptr=naccept/R)
}

Peter Dalgaard

2005-May-20 07:04 UTC

head link

[R] R 2.1.0 RH Linux Built from Source Segmentation Fault

Bruce Foster <bef at northwestern.edu> writes:

... > The machines are AMD Athlon MP 2400+ with 2 GB RAM, dual CPUs, and
> lots of free disk space.
Any per-user/per-process limits? Resource usage look suspiciously
close to 256M. If your install is allowing overcommitment of memory,
the OS can kill processes at unpredictable times.
> I've got a user running Monte Carlo codes that fail with segmentation
> faults on a frequent basis. The jobs run for a long time (up to a day)
> before failure.
> 
> If a failed job is rerun, chances are high that it will run to completion.
> 
> I'm at a loss about approaching this problem. R (as it is here)
> doesn't seem to give much of a hint as to where things are when it
> crashes.
> 
> I'm looking for some guidance to diagnose this problem so we can focus
> on a solution.
(A) Use set.seed(...) to get a fixed sequence of random numbers. If it
still fails unpredictably, my bet is that it is a resource problem.

(B) Once you have a case that fails predictably, run it under a
debugger and try to backtrack to the point of failure. There are
various debugging tricks that you can use, but just get there first
and show us a stack backtrace at the failure point (bt command in
gdb).

For more detailed guidance you should probably move the discussion to
the r-devel list.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Seemingly Similar Threads

Search for more possibly parallel threads

R help - May 2005 - R 2.1.0 RH Linux Built from Source Segmentation Fault

[R] R 2.1.0 RH Linux Built from Source Segmentation Fault

[R] R 2.1.0 RH Linux Built from Source Segmentation Fault

Seemingly Similar Threads