Terry Therneau
2012-Aug-09 13:00 UTC
[R] basehaz() in package survival and warnings with coxph
I've never seen this, and have no idea how to reproduce it. For resloution you are going to have to give me a working example of the failure. Also, per the posting guide, what is your sessionInfo()? Terry Therneau On 08/09/2012 04:11 AM, r-help-request at r-project.org wrote:> I have a couple of questions with regards to fitting a coxph model to a data > set in R: > > I have a very large dataset and wanted to get the baseline hazard using the > basehaz() function in the package : 'survival'. > If I use all the covariates then the output from basehaz(fit), where fit is > a model fit using coxph(), gives 507 unique values for the time and the > corresponding cumulative hazard function. However if I use a subset of the > varaibles, basehaz() gives 611 values for the time and cumulative hazard.
Terry Therneau
2012-Aug-10 13:36 UTC
[R] basehaz() in package survival and warnings with coxph
Since fit3.1 and fit2 are based on different data sets, why would I expect the same number of events? Also, when you have a large number of variables, are observations being deleted due to missing values? And to echo David W's comments -- it is hard for me to imagine a data set where this many variables can be looked at simultaneoulsy, and obtain a meaningful result. Terry Therneau On 08/09/2012 07:52 PM, Nasib Ahmed wrote:> My sessionInfo is as follows: > > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] mi_0.09-16 arm_1.5-05 foreign_0.8-50 abind_1.4-0 > [5] R2WinBUGS_2.1-18 coda_0.15-2 lme4_0.999999-0 Matrix_1.0-6 > [9] lattice_0.20-6 car_2.0-12 nnet_7.3-4 MASS_7.3-20 > [13] MuMIn_1.7.11 survival_2.36-14 > > loaded via a namespace (and not attached): > [1] grid_2.15.1 nlme_3.1-104 stats4_2.15.1 > > > > > It will be difficult to reproduce an example here as the data set I am > using in very large. I can give you an example: > > fit3.1<- coxph(formula = y ~ sex + ns(ageyrs, df = 2) + AdmissionSource + > + X1 + X2 + X3 + X5 + X6 + X7 + X11 + X12 + X13 + X14 + X15 + > + X16 + X17 + X18 + X19 + X20 + X22 + X24 + X25 + X26 + X27 + > + X28 + X29 + X32 + X33 + X35 + X38 + X39 + X40 + X41 + X42 + > + X43 + X44 + X47 + X49 + X53 + X54 + X55 + X58 + X59 + X62 + > + X68 + X69 + X78 + X80 + X81 + X84 + X85 + X86 + X93 + X95 + > + X98 + X100 + X101 + X102 + X105 + X107 + X108 + X109 + X110 + > + X112 + X113 + X114 + X115 + X116 + X117 + X121 + X122 + X125 + > + X127 + X128 + X129 + X131 + X132 + X133 + X134 + X138 + X140 + > + X143 + X145 + X146 + X148 + X150 + X151 + X153 + X157 + X158 + > + X159 + X164 + X197 + X200 + X202 + X203 + X204 + X205 + X211 + > + X214 + X217 + X224 + X228 + X233 + X237 + X244 + X249 + X254 + > + X258 + X259 + X260 + CharlsonIndex + ethnic + day + season + > + ln, data = dat2) > > haz<-basehaz(fit3.1) # gives 507 unique haz$time, time points > > fit2<-coxph(y~ns(ageyrs,df=2)+day+ln+sex+AdmissionSource+season+CharlsonIndex,data=dat1) > > haz<-basehaz(fit2) # gives 611 unique haz$time, time points > > > I get the following warnings() with fit3.1: > Warning message: > In fitter(X, Y, strats, offset, init, control, weights = weights, : > Loglik converged before variable ; beta may be infinite. > > Also the coefficients of the variables that the error occurs for are > very high. The Wald test suggests dropping these terms where as the > LRT suggests keeping them. What should I do in terms of model selection? > > > > > > > > > On Thu, Aug 9, 2012 at 2:00 PM, Terry Therneau <therneau@mayo.edu > <mailto:therneau@mayo.edu>> wrote: > > I've never seen this, and have no idea how to reproduce it. > For resloution you are going to have to give me a working example > of the failure. > > Also, per the posting guide, what is your sessionInfo()? > > Terry Therneau > > On 08/09/2012 04:11 AM, r-help-request@r-project.org > <mailto:r-help-request@r-project.org> wrote: > > I have a couple of questions with regards to fitting a coxph > model to a data > set in R: > > I have a very large dataset and wanted to get the baseline > hazard using the > basehaz() function in the package : 'survival'. > If I use all the covariates then the output from basehaz(fit), > where fit is > a model fit using coxph(), gives 507 unique values for the > time and the > corresponding cumulative hazard function. However if I use a > subset of the > varaibles, basehaz() gives 611 values for the time and > cumulative hazard. > >[[alternative HTML version deleted]]