Denis.Aydin at unibas.ch
2010-Dec-09 12:14 UTC
[R] Calculating odds ratios from logistic GAM model
Dear R-helpers I have a question related to logistic GAM models. Consider the following example: # Load package library(mgcv) # Simulation of dataset n <- 1000 set.seed(0) age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) cholesterol <- rnorm(n, 200, 25) sex <- factor(sample(c('female','male'), n,TRUE)) L <- 0.4*(sex=='male') + 0.045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) y <- ifelse(runif(n) < plogis(L), 1, 0) I now want to fit a logistic GAM model and model age as a cubic spline: fit <- gam(y ~ blood.pressure + sex + cholesterol + s(age, bs="cr") ,family=binomial) Now my question: In a normal logistic regression, the odds ratio (OR) simply is the exponentiated coefficient exp(beta). How is it possible to calculate the odds ratio for age (in this example) based on the spline? For example the odds ratio based on the spline between the age of, say, 20-30? Or even better: How can I plot the odds ratios against age in a continuous form? Many thanks for your help. Best, Denis Aydin -------------------------------------------------------------------------- This email and any files transmitted with it are confide...{{dropped:8}}
On Dec 9, 2010, at 7:14 AM, Denis.Aydin at unibas.ch wrote:> Dear R-helpers > I have a question related to logistic GAM models. Consider the > following > example: > # Load package > library(mgcv) > > # Simulation of dataset > n <- 1000 > set.seed(0) > age <- rnorm(n, 50, 10) > blood.pressure <- rnorm(n, 120, 15) > cholesterol <- rnorm(n, 200, 25) > sex <- factor(sample(c('female','male'), n,TRUE)) > > L <- 0.4*(sex=='male') + 0.045*(age-50) + (log(cholesterol - > 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) > y <- ifelse(runif(n) < plogis(L), 1, 0) > > > I now want to fit a logistic GAM model and model age as a cubic > spline: > > fit <- gam(y ~ blood.pressure + sex + cholesterol + s(age, bs="cr") > ,family=binomial)I'm wondering if there might be a problem with my understanding of the appropriate terminology. Why would such a model be called logistic? There is no parametric relationship between some reference set and the rest of the prediction space. And I'm also wondering why one would even _want_ an odds ratio? Odds ratios were always an approximation to what one really wanted, namely either a proportion or a rate ratio. We students were asked to readjust our brains to conform to the deliverables from the rather twisted (I suppose "transformed" would be more accurate) mechanics of "logistic" regression, and we dutifully did so with varying degrees iof success. But now ...it seems it should be perfectly acceptable to leave that cognitive tunnel behind and use the methods available capable of generating perfectly sensible output using "predict" methods. (This does lead to the the answer I had originally started to write.... just pick a reference category and use predict with type="response". And if you understand what odds are, and many people are incapable of giving a mathematically correct definition, then it's pretty straightforward.)> > Now my question: In a normal logistic regression, the odds ratio (OR) > simply is the exponentiated coefficient exp(beta). > How is it possible to calculate the odds ratio for age (in this > example) > based on the spline? For example the odds ratio based on the spline > between the age of, say, 20-30? > Or even better: How can I plot the odds ratios against age in a > continuous > form?And my counter-question .... why would we want to? Why are you ignoring the predict(model, type="response") facilities?> > Many thanks for your help. > > Best, > Denis Aydin >-- David Winsemius, MD West Hartford, CT