j@de@shod@@ m@iii@g oii googiem@ii@com
2022-Jul-19 15:07 UTC
[R] mgcv: relative risk from GAM with distributed lag
Dear list members, Does anyone know how to obtain a relative risk/ risk ratio from a GAM with a distributed lag model implemented in mgcv? I have a GAM predicting daily deaths from time series data consisting of daily temperature, humidity and rainfall. The GAM includes a distributed lag model because deaths may occur over several days following a high heat day. What I'd like to do is compute (and plot) the relative risk (accumulated across all lags) for a given temperature vs the temperature at which the risk is lowest, with corresponding confidence intervals. I am aware of the predict.gam function but am not sure if and how it should be used in this case. (Additionally, I'd also like to plot the relative risk for different lags separately). I apologise if this seems trivial to some. (Actually, I hope it is, because that might mean I get a solution!) I've been looking for examples on how to do this, but found nothing so far. Suggestions would be very much appreciated! Below is a reproducible example with the GAM: library(mgcv) set.seed(3) # make reproducible example simdat <- gamSim(1,400) # simulate data g <- exp(simdat$f/5) simdat$y <- rnbinom(g,size=3,mu=g) # negative binomial response var simdat$time <- 1:400 # create time series names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f", "f0", "f1", "f2", "f3", "time") # lag function based on Simon Wood (book 2017, p.349 and gamair package documentation p.54 # https://cran.rstudio.com/web/packages/gamair/gamair.pdf) lagard <- function(x,n.lag=7) { n <- length(x); X <- matrix(NA,n,n.lag) for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1] X } # set up lag, temp, rain and humidity as 7-column matrices # to create lagged variables - based on Simon Wood's example dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE), deaths=simdat$deaths, time = simdat$time) dat$temp <- lagard(simdat$temp) dat$rain <- lagard(simdat$rain) dat$humidity <- lagard(simdat$humidity) mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) + te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat, family = nb, method = 'REML', select = TRUE) summary(mod) plot(mod, scheme = 1) plot(mod, scheme = 2) Thanks for any suggestions you may have, Jade
j@de@shod@@ m@iii@g oii googiem@ii@com
2022-Jul-21 14:19 UTC
[R] mgcv: relative risk from GAM with distributed lag
Hello everyone (incl. Simon Wood?), I'm not sure that my original question (see below, including reproducible example) was as clear as it could have been. To clarify, what I would to like to get is: 1) a perspective plot of temperature x lag x relative risk. I know how to use plot.gam and vis.gam but don't know how to get plots on the relative risk scale as opposed to "response" or "link". 2) a plot of relative risk (accumulated across all lags) vs temperature, given a reference temperature. An example of such a plot can be found in figure 2 (bottom) of this paper by Gasparrini et al: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3940 I've seen Simon Wood's response to a related issue here: https://stat.ethz.ch/pipermail/r-help/2012-May/314387.html However, I'm not sure how to apply this to time series data with distributed lag, to get the above mentioned figures. Would be really grateful for suggestions! Jade On Tue, 19 Jul 2022 at 16:07, jade.shodan at googlemail.com <jade.shodan at googlemail.com> wrote:> > Dear list members, > > Does anyone know how to obtain a relative risk/ risk ratio from a GAM > with a distributed lag model implemented in mgcv? I have a GAM > predicting daily deaths from time series data consisting of daily > temperature, humidity and rainfall. The GAM includes a distributed lag > model because deaths may occur over several days following a high heat > day. > > What I'd like to do is compute (and plot) the relative risk > (accumulated across all lags) for a given temperature vs the > temperature at which the risk is lowest, with corresponding confidence > intervals. I am aware of the predict.gam function but am not sure if > and how it should be used in this case. (Additionally, I'd also like > to plot the relative risk for different lags separately). > > I apologise if this seems trivial to some. (Actually, I hope it is, > because that might mean I get a solution!) I've been looking for > examples on how to do this, but found nothing so far. Suggestions > would be very much appreciated! > > Below is a reproducible example with the GAM: > > library(mgcv) > set.seed(3) # make reproducible example > simdat <- gamSim(1,400) # simulate data > g <- exp(simdat$f/5) > simdat$y <- rnbinom(g,size=3,mu=g) # negative binomial response var > simdat$time <- 1:400 # create time series > names(simdat) <- c("deaths", "temp", "humidity", "rain", "x3", "f", > "f0", "f1", "f2", "f3", "time") > > # lag function based on Simon Wood (book 2017, p.349 and gamair > package documentation p.54 > # https://cran.rstudio.com/web/packages/gamair/gamair.pdf) > lagard <- function(x,n.lag=7) { > n <- length(x); X <- matrix(NA,n,n.lag) > for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1] > X > } > > # set up lag, temp, rain and humidity as 7-column matrices > # to create lagged variables - based on Simon Wood's example > dat <- list(lag=matrix(0:6,nrow(simdat),7,byrow=TRUE), > deaths=simdat$deaths, time = simdat$time) > dat$temp <- lagard(simdat$temp) > dat$rain <- lagard(simdat$rain) > dat$humidity <- lagard(simdat$humidity) > > mod <- gam(deaths~s(time, k=70) + te(temp, lag, k=c(12, 4)) + > te(humidity, lag, k=c(12, 4)) + te(rain, lag, k=c(12, 4)), data = dat, > family = nb, method = 'REML', select = TRUE) > > summary(mod) > plot(mod, scheme = 1) > plot(mod, scheme = 2) > > Thanks for any suggestions you may have, > > Jade