Dear Everybody:
I'm doing my usual "how does that work in R" thing with some Stata
projects.  I find a gross gap between the Stata and R in Cox PH models, 
and I hope you can give me some pointers about what goes wrong.  I'm 
getting signals from R/Survival that the model just can't be estimated, 
but Stata spits out numbers just fine.
I wonder if I should specify initial values for coxph?
I got a dataset from a student who uses Stata and try to replicate in R. 
I will share data to you in case you want to see for yourself.  Let me 
know if you want text or Stata data file.
In R, I try this:
 > cox2 <- coxph(Surv(yrs2,ratify)~ accession+ haz.wst+ haz.in +haz.out+ 
wefgov+ rle+ rqe + pol.free +tai.2001 + ny.gdp.pcap.pp.cd + eio, 
data=dat3, control=coxph.control(iter.max=1000),singular.ok=T)
Warning message:
Ran out of iterations and did not converge in: fitter(X, Y, strats, 
offset, init, control, weights = weights,
So I wrote out the file exatly as it was in R into Stata dataset
 > write.dta(dat3,"cleanBasel.dta")
Warning message:
Abbreviating variable names in: write.dta(dat3, "cleanBasel.dta")
Here's the Stata output:
. use
"/home/pauljohn/ps/ps909/AdvancedRegression/duration_2/cleanBasel.dta"
(Written by R.              )
. stset yrs2, failure (ratify)
      failure event:  ratify != 0 & ratify < .
obs. time interval:  (0, yrs2]
  exit on or before:  failure
----------------------------------------------------------------------------
 > --
        21  total obs.
         0  exclusions
----------------------------------------------------------------------------
 > --
        21  obs. remaining, representing
        21  failures in single record/single failure data
        78  total analysis time at risk, at risk from t =         0
                              earliest observed entry t =         0
. stcox accessin haz_wst  haz_in  haz_out  wefgov  rle  rqe  pol_free 
tai_2001 ny_gd  eio, robust
 >  nohr
          failure _d:  ratify
    analysis time _t:  yrs2
Iteration 0:   log pseudo-likelihood = -49.054959
Iteration 1:   log pseudo-likelihood = -45.021682
Iteration 2:   log pseudo-likelihood = -44.525187
Iteration 3:   log pseudo-likelihood = -44.521588
Iteration 4:   log pseudo-likelihood = -44.521586
Refining estimates:
Iteration 0:   log pseudo-likelihood = -44.521586
Cox regression -- Breslow method for ties
No. of subjects       =           21               Number of obs   = 
     21
No. of failures       =           21
Time at risk          =           78
                                                    Wald chi2(11)   = 
   81.64
Log pseudo-likelihood =   -44.521586               Prob > chi2     = 
0.0000
------------------------------------------------------------------------------
              |               Robust
           _t |      Coef.   Std. Err.      z    P>|z|
-------------+----------------------------------------------------------------
     accessin |  -1.114101   .6343663    -1.76   0.079
      haz_wst |   2.32e-08   1.08e-07     0.22   0.829
       haz_in |   3.78e-06   2.46e-06     1.54   0.124
      haz_out |  -3.80e-07   3.76e-07    -1.01   0.312
       wefgov |   2.139127   .9136992     2.34   0.019
          rle |   1.827482   1.500878     1.22   0.223
          rqe |  -3.126696   1.332069    -2.35   0.019
     pol_free |  -.4498276    .291764    -1.54   0.123
     tai_2001 |  -2.895922   2.577401    -1.12   0.261
     ny_gd___ |  -.0003223   .0002194    -1.47   0.142
          eio |  -.0577773   .0726064    -0.80   0.426
------------------------------------------------------------------------------
.
                                  last observed exit t =         7
----------------------------------
Paul Johnson
Dept. of Political Science
University of Kansas
Thomas Lumley
2004-May-11  14:10 UTC
[R] Explaining Survival difference between Stata and R
On Mon, 10 May 2004, Paul Johnson wrote:> Dear Everybody: > > I'm doing my usual "how does that work in R" thing with some Stata > projects. I find a gross gap between the Stata and R in Cox PH models, > and I hope you can give me some pointers about what goes wrong. I'm > getting signals from R/Survival that the model just can't be estimated, > but Stata spits out numbers just fine. > > I wonder if I should specify initial values for coxph?It's worth a try. The other question is whether Stata has in fact converged -- if the range of rqe is not small then its coefficient may actually be infinite.> I got a dataset from a student who uses Stata and try to replicate in R. > I will share data to you in case you want to see for yourself. Let me > know if you want text or Stata data file.I'd like to look at the data. We should be able to get coxph to converge when there is a finite mle -- the log partial likelihood is concave. -thomas> In R, I try this: > > > cox2 <- coxph(Surv(yrs2,ratify)~ accession+ haz.wst+ haz.in +haz.out+ > wefgov+ rle+ rqe + pol.free +tai.2001 + ny.gdp.pcap.pp.cd + eio, > data=dat3, control=coxph.control(iter.max=1000),singular.ok=T) > Warning message: > Ran out of iterations and did not converge in: fitter(X, Y, strats, > offset, init, control, weights = weights, > > So I wrote out the file exatly as it was in R into Stata dataset > > > write.dta(dat3,"cleanBasel.dta") > Warning message: > Abbreviating variable names in: write.dta(dat3, "cleanBasel.dta") > > > Here's the Stata output: > > . use "/home/pauljohn/ps/ps909/AdvancedRegression/duration_2/cleanBasel.dta" > (Written by R. ) > > . stset yrs2, failure (ratify) > > failure event: ratify != 0 & ratify < . > obs. time interval: (0, yrs2] > exit on or before: failure > > ---------------------------------------------------------------------------- > > -- > 21 total obs. > 0 exclusions > ---------------------------------------------------------------------------- > > -- > 21 obs. remaining, representing > 21 failures in single record/single failure data > 78 total analysis time at risk, at risk from t = 0 > earliest observed entry t = 0 > > . stcox accessin haz_wst haz_in haz_out wefgov rle rqe pol_free > tai_2001 ny_gd eio, robust > > nohr > > failure _d: ratify > analysis time _t: yrs2 > > Iteration 0: log pseudo-likelihood = -49.054959 > Iteration 1: log pseudo-likelihood = -45.021682 > Iteration 2: log pseudo-likelihood = -44.525187 > Iteration 3: log pseudo-likelihood = -44.521588 > Iteration 4: log pseudo-likelihood = -44.521586 > Refining estimates: > Iteration 0: log pseudo-likelihood = -44.521586 > > Cox regression -- Breslow method for ties > > No. of subjects = 21 Number of obs > 21 > No. of failures = 21 > Time at risk = 78 > Wald chi2(11) > 81.64 > Log pseudo-likelihood = -44.521586 Prob > chi2 > 0.0000 > > ------------------------------------------------------------------------------ > | Robust > _t | Coef. Std. Err. z P>|z| > -------------+---------------------------------------------------------------- > accessin | -1.114101 .6343663 -1.76 0.079 > haz_wst | 2.32e-08 1.08e-07 0.22 0.829 > haz_in | 3.78e-06 2.46e-06 1.54 0.124 > haz_out | -3.80e-07 3.76e-07 -1.01 0.312 > wefgov | 2.139127 .9136992 2.34 0.019 > rle | 1.827482 1.500878 1.22 0.223 > rqe | -3.126696 1.332069 -2.35 0.019 > pol_free | -.4498276 .291764 -1.54 0.123 > tai_2001 | -2.895922 2.577401 -1.12 0.261 > ny_gd___ | -.0003223 .0002194 -1.47 0.142 > eio | -.0577773 .0726064 -0.80 0.426 > ------------------------------------------------------------------------------ > > . > last observed exit t = 7 > > > ---------------------------------- > > > > Paul Johnson > Dept. of Political Science > University of Kansas > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle