j@de@shod@@ m@iii@g oii googiem@ii@com
2022-Jul-02 18:20 UTC
[R] mgcv: estimate concurvity vs worst concurvity in GAMs
Dear list members, I was wondering if someone could explain (in conceptual terms) how to interpret *estimate* concurvity in a GAM implemented with mgcv and how it differs from *worst* concurvity (as obtained through mgcv's concurvity function). I understand that concurvity is the non-parametric analogue of collinearity in GAMs and that it represents the extent to which a smooth term can be approximated by one or more of the other smooth terms in the model. It seems to be common practice to base one's course of action on the worst concurvity estimate (as e.g. advised in Noam Ross' course on GAMs). However, the mgcv help page for concurvity states that worst concurvity is a "fairly pessimistic measure, as it looks at the worst case irrespective of data", whereas estimate concurvity "does not suffer from the pessimism or potential for over-optimism of the previous two measures, but is less easy to understand". Worst concurvity is extremely high in my GAMs, whereas estimate concurvity is much lower (see below), so I am unsure as to whether I should deal with the concurvity. I should stress that the aim of my model is to gain an understanding of the relationship between variables, rather than pure prediction performance. For those interested, concurvity values are for a GAM with number of daily deaths as response variable, and a smooth of time, a smooth of a heat variable (wbgt_mean) and a smooth of precipitation as predictors, the latter one being a potential confounder and wbgt_mean being the variable of interest. Heat and precipitation are modelled as having distributed lag (6 days), set up as 7 column matrices as per Simon Wood's book on GAMs (2017, p. 352). The model is as follows: c1b <- gam(deaths_ip~s(time, k=200) + te(wbgt_mean, lag, k=c(12, 4)) + te(precip_daily_total, lag, k=c(12, 4)), data = dat, family = nb, method = 'REML', select = TRUE) Let's ignore the issue of time decomposition for the moment, since none of the many ways I've tried reduced concurvity much. Depending on whether I have to deal with concurvity or not, my models and their interpretation will look very different. As a potential solution for high concurvity, I developed alternative models using a detrended measure of heat as a predictor (by using residuals from a GAM with heat as response and time as a predictor). Same for precipitation. This does reduce concurvity substantially, but it severely reduces the practical application/ interpretation of the results, so I'd rather not take this route if I can avoid it. (Modelling with an autoregressive term, as helpfully suggested in response to a previous post, did not help to reduce concurvity either). Below is the output from the concurvity function with argument full=TRUE. Explanations of estimate vs worst concurvity will be very gratefully received! para s(time) te(wbgt_mean,lag) te(precip_daily_total,lag) worst 0.957257 0.96533049 0.9811214 0.9749704 observed 0.957257 0.03825656 0.7652984 0.8568042 estimate 0.957257 0.04334243 0.4197013 0.5975567 And with argument full= FALSE: $worst para s(time) te(wbgt_mean,lag) te(precip_daily_total,lag) para 1.000000e+00 6.033833e-17 0.04891485 0.6677235 s(time) 6.033871e-17 1.000000e+00 0.96109743 0.7784443 te(wbgt_mean,lag) 4.006085e-02 9.521445e-01 1.00000000 0.6941748 te(precip_daily_total,lag) 6.677235e-01 7.784443e-01 0.70026895 1.0000000 $observed para s(time) te(wbgt_mean,lag) te(precip_daily_total,lag) para 1.000000e+00 2.203611e-27 5.898608e-33 0.5790756 s(time) 6.033871e-17 1.000000e+00 7.485690e-01 0.2652631 te(wbgt_mean,lag) 4.006085e-02 1.928856e-02 1.000000e+00 0.1902533 te(precip_daily_total,lag) 6.677235e-01 1.914191e-02 3.076595e-01 1.0000000 $estimate para s(time) te(wbgt_mean,lag) te(precip_daily_total,lag) para 1.000000e+00 4.023819e-23 6.199709e-33 0.2582767 s(time) 6.033871e-17 1.000000e+00 4.031365e-01 0.3198069 te(wbgt_mean,lag) 4.006085e-02 2.519282e-02 1.000000e+00 0.2498455 te(precip_daily_total,lag) 6.677235e-01 1.644668e-02 1.119781e-01 1.0000000