thr3ads.net - R help - [R] Meta-analysis of prevalence at the country level with mgcv/gamm4 [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Julien Riou

2014-Apr-08 08:31 UTC

[R] Meta-analysis of prevalence at the country level with mgcv/gamm4

Dear R community,

I'm working on a meta-analysis of prevalence data. The aim is to get
estimates of prevalence at the country level. The main issue is that the
disease is highly correlated with age, and the sample ages of included
studies are highly heterogeneous. Only median age is available for most
studies, so I can't use SMR-like tricks. I figured I could use
meta-regression to solve this, including age as a fixed-effect and
introducing study-level and country-level random-effects.

The idea (that I took from Fowkes et
al<http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2813%2961249-0/abstract>)
was to use this model to make country-specific predictions of prevalence
for each 5-year age group from 15 to 60 (using the median age of the
group), and to apply these predictions to the actual population size of
each of those groups in the selected country, in order to obtain total
infected population and to calculate age-adjusted prevalence in the 15-60
population from that.

I tried several ways to do this using R with packages meta and mgcv. I got
some satisfying results, but I'm not that confident with my results and
would appreciate some feedback.

First is some simulated data, then the description of my different
approaches:

data<-data.frame(id_study=c("UK1","UK2","UK3","FRA1","FRA2","BEL1","GER1","GER2","GER3"),

country=c("UK","UK","UK","FRANCE","FRANCE","BELGIUM","GERMANY","GERMANY","GERMANY"),
                 n_events=c(91,49,18,10,50,6,9,10,22),
                 n_total=c(3041,580,252,480,887,256,400,206,300),
                 study_median_age=c(25,50,58,30,42,26,27,28,36))

*Standard random-effect meta-analysis* with package meta.

I used metaprop() to get a first estimate of the prevalence in each country
without taking age into account, and to obtain weights. As expected,
heterogeneity was very high, so I used weights from the random-effects
model.

 meta <-
metaprop(event=n_events,n=n_total,byvar=country,sm="PLOGIT",method.tau="REML",data=data)
 summary(meta)
 data$weight<-meta$w.random

I used meta to get a first estimate of the prevalence without taking age
into account, and to obtain weights. As expected, heterogeneity was very
high, so I used weights from the random-effects model.

*Generalized additive model* to include age with package mgcv.

The gam() model parameters (k and sp) were chosen using BIC and GCV number
(not shown here).

 model <- gam( cbind(n_events,n_total-n_events) ~
s(study_median_age,bs="cr",k=4,sp=2) + s(country,bs="re"),
weights=weight, data=data, family="binomial"(link=logit),
method="REML")
 plot(model,pages=1,residuals=T, all.terms=T, shade=T)

Predictions for each age group were obtained from this model as explained
earlier. CI were obtained directly using predict.gam(), that uses the
Bayesian posterior covariance matrix of the parameters. For exemple
considering UK:

 newdat<-data.frame(country="UK",study_median_age=seq(17,57,5))
 link<-predict(model,newdat,type="link",se.fit=T)$fit
 linkse<-predict(model,newdat,type="link",se.fit=T)$se
 newdat$prev<-model$family$linkinv(link)
 newdat$CIinf<-model$family$linkinv(link-1.96*linkse)
 newdat$CIsup<-model$family$linkinv(link+1.96*linkse)
 plot(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
 lines(newdat$CIinf~newdat$study_median_age, lty=2)
 lines(newdat$CIsup~newdat$study_median_age, lty=2)

The results were satisfying, representing the augmentation of the
prevalence with advanced age, with coherent confidence intervals. I
obtained a total prevalence for the country using the country population
structure (not shown, I hope it is clear enough).

However, I figured I needed to include study-level random-effects since
there was a high heterogeneity (even though I did not calculate
heterogeneity after the meta-regression).

*Introducing study-level random-effect* with package gamm4.

Since mgcv models can't handle that much random-effect parameters, I had to
switch to gamm4.

 model2 <- gamm4(cbind(n_events,n_total-n_events) ~
s(study_median_age,bs="cr",k=4) + s(country,bs="re"),
random=~(1|id_study), data=data, weights=weight,
family="binomial"(link=logit))
 plot(model2$gam,pages=1,residuals=T, all.terms=T, shade=T)

 link<-predict(model2$gam,newdat,type="link",se.fit=T)$fit
 linkse<-predict(model2$gam,newdat,type="link",se.fit=T)$se
 newdat$prev2<-model$family$linkinv(link)
 newdat$CIinf2<-model$family$linkinv(link-1.96*linkse)
 newdat$CIsup2<-model$family$linkinv(link+1.96*linkse)
 plot(newdat$prev2~newdat$study_median_age,
type="l",col="red",ylim=c(0,0.11))
 lines(newdat$CIinf2~newdat$study_median_age, lty=2,col="red")
 lines(newdat$CIsup2~newdat$study_median_age, lty=2,col="red")
 lines(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
 lines(newdat$CIinf~newdat$study_median_age, lty=2)
 lines(newdat$CIsup~newdat$study_median_age, lty=2)

Since the study-level random effect was in the mer part of the fit, I
didn't have to handle it.

As you can see, I obtain rather different results, with a much smoother
relation between age and prevalence, and quite different confidence
intervals. It is even more different in the full-data analysis, where the
CI are much wider in the model including study-level RE, to the point it is
sometimes almost uninformative (prevalence between 0 and 15%, but if it is
the way it is...). Moreover, the study-level RE model seems to be more
stable when outliers are excluded.

*So, my questions are:*

   - Did I properly extract the weights from the metaprop() function and
   used them further?
   - Did I properly built my gam() and gamm4() models? I read a lot about
   this, but I'm not used to this king of models.
   - Which of these models should I use?

I would really appreciate some help, since neither my teachers nor my
colleagues could. It was a really harsh to conduct the systematic review,
and very frustrating to struggle with the analysis... Thank you in advance!

Julien

	[[alternative HTML version deleted]]

Michael Dewey

2014-Apr-09 17:39 UTC

head link

[R] Meta-analysis of prevalence at the country level with mgcv/gamm4

At 09:31 08/04/2014, Julien Riou wrote:>Dear R community,
Comments in line below
>I'm working on a meta-analysis of prevalence data. The aim is to get
>estimates of prevalence at the country level. The main issue is that the
>disease is highly correlated with age, and the sample ages of included
>studies are highly heterogeneous. Only median age is available for most
>studies, so I can't use SMR-like tricks. I figured I could use
>meta-regression to solve this, including age as a fixed-effect and
>introducing study-level and country-level random-effects.
>
>The idea (that I took from Fowkes et
>al<http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2813%2961249-0/abstract>)
>was to use this model to make country-specific predictions of prevalence
>for each 5-year age group from 15 to 60 (using the median age of the
>group), and to apply these predictions to the actual population size of
>each of those groups in the selected country, in order to obtain total
>infected population and to calculate age-adjusted prevalence in the 15-60
>population from that.
>
>I tried several ways to do this using R with packages meta and mgcv. I got
>some satisfying results, but I'm not that confident with my results and
>would appreciate some feedback.
>
>First is some simulated data, then the description of my different
>approaches:
>
>data<-data.frame(id_study=c("UK1","UK2","UK3","FRA1","FRA2","BEL1","GER1","GER2","GER3"),
>
>country=c("UK","UK","UK","FRANCE","FRANCE","BELGIUM","GERMANY","GERMANY","GERMANY"),
>                  n_events=c(91,49,18,10,50,6,9,10,22),
>                  n_total=c(3041,580,252,480,887,256,400,206,300),
>                  study_median_age=c(25,50,58,30,42,26,27,28,36))
So the first UK study with a median age of 25 is going to be used to 
estimate prevalence over a range of ages? You are going to have to 
make some very strong assumptions here which I personally would not 
want to make.

Is there any possibility that in the real dataset you can fit your 
model to those studies which do provide age-specific prevalences and 
then use that to impute?

You do not say when these studies were published but I would ask the 
authors of the primary studies if they can make the information 
available to you. You may have already done that of course. I referee 
quite a few papers on systematic reviews and my impression is that 
some authors are amenable to doing the work for you. You mileage may 
vary of course.
>*Standard random-effect meta-analysis* with package meta.
>
>I used metaprop() to get a first estimate of the prevalence in each country
>without taking age into account, and to obtain weights. As expected,
>heterogeneity was very high, so I used weights from the random-effects
>model.
Which will be nearly equal and so hardly worth using in my opinion 
but again your mileage may vary.
>  meta <- 
>
metaprop(event=n_events,n=n_total,byvar=country,sm="PLOGIT",method.tau="REML",data=data)
>  summary(meta)
>  data$weight<-meta$w.random
>
>I used meta to get a first estimate of the prevalence without taking age
>into account, and to obtain weights. As expected, heterogeneity was very
>high, so I used weights from the random-effects model.
>
>*Generalized additive model* to include age with package mgcv.
>
>The gam() model parameters (k and sp) were chosen using BIC and GCV number
>(not shown here).
>
>  model <- gam( cbind(n_events,n_total-n_events) ~
>s(study_median_age,bs="cr",k=4,sp=2) +
s(country,bs="re"),
>weights=weight, data=data, family="binomial"(link=logit),
>method="REML")
>  plot(model,pages=1,residuals=T, all.terms=T, shade=T)
>
>Predictions for each age group were obtained from this model as explained
>earlier. CI were obtained directly using predict.gam(), that uses the
>Bayesian posterior covariance matrix of the parameters. For exemple
>considering UK:
>
> 
newdat<-data.frame(country="UK",study_median_age=seq(17,57,5))
>  link<-predict(model,newdat,type="link",se.fit=T)$fit
>  linkse<-predict(model,newdat,type="link",se.fit=T)$se
>  newdat$prev<-model$family$linkinv(link)
>  newdat$CIinf<-model$family$linkinv(link-1.96*linkse)
>  newdat$CIsup<-model$family$linkinv(link+1.96*linkse)
>  plot(newdat$prev~newdat$study_median_age,
type="l",ylim=c(0,.12))
>  lines(newdat$CIinf~newdat$study_median_age, lty=2)
>  lines(newdat$CIsup~newdat$study_median_age, lty=2)
>
>The results were satisfying, representing the augmentation of the
>prevalence with advanced age, with coherent confidence intervals. I
>obtained a total prevalence for the country using the country population
>structure (not shown, I hope it is clear enough).
>
>However, I figured I needed to include study-level random-effects since
>there was a high heterogeneity (even though I did not calculate
>heterogeneity after the meta-regression).
>
>*Introducing study-level random-effect* with package gamm4.
>
>Since mgcv models can't handle that much random-effect parameters, I had
to
>switch to gamm4.
>
>  model2 <- gamm4(cbind(n_events,n_total-n_events) ~
>s(study_median_age,bs="cr",k=4) + s(country,bs="re"),
>random=~(1|id_study), data=data, weights=weight,
>family="binomial"(link=logit))
>  plot(model2$gam,pages=1,residuals=T, all.terms=T, shade=T)
>
>  link<-predict(model2$gam,newdat,type="link",se.fit=T)$fit
>  linkse<-predict(model2$gam,newdat,type="link",se.fit=T)$se
>  newdat$prev2<-model$family$linkinv(link)
>  newdat$CIinf2<-model$family$linkinv(link-1.96*linkse)
>  newdat$CIsup2<-model$family$linkinv(link+1.96*linkse)
>  plot(newdat$prev2~newdat$study_median_age, 
> type="l",col="red",ylim=c(0,0.11))
>  lines(newdat$CIinf2~newdat$study_median_age, lty=2,col="red")
>  lines(newdat$CIsup2~newdat$study_median_age, lty=2,col="red")
>  lines(newdat$prev~newdat$study_median_age,
type="l",ylim=c(0,.12))
>  lines(newdat$CIinf~newdat$study_median_age, lty=2)
>  lines(newdat$CIsup~newdat$study_median_age, lty=2)
>
>Since the study-level random effect was in the mer part of the fit, I
>didn't have to handle it.
>
>As you can see, I obtain rather different results, with a much smoother
>relation between age and prevalence, and quite different confidence
>intervals. It is even more different in the full-data analysis, where the
>CI are much wider in the model including study-level RE, to the point it is
>sometimes almost uninformative (prevalence between 0 and 15%, but if it is
>the way it is...). Moreover, the study-level RE model seems to be more
>stable when outliers are excluded.
>
>*So, my questions are:*
>
>    - Did I properly extract the weights from the metaprop() function and
>    used them further?
>    - Did I properly built my gam() and gamm4() models? I read a lot about
>    this, but I'm not used to this king of models.
>    - Which of these models should I use?
>
>I would really appreciate some help, since neither my teachers nor my
>colleagues could. It was a really harsh to conduct the systematic review,
>and very frustrating to struggle with the analysis... Thank you in advance!
I am afraid that is the way with systematic reviews, you can only 
synthesise what you find, not what you would like to have found. 
Anyone who has done a review will sympathise with you, not that that 
is any consolation.

>Julien
>
>         [[alternative HTML version deleted]]
Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

R help - Apr 2014 - Meta-analysis of prevalence at the country level with mgcv/gamm4

[R] Meta-analysis of prevalence at the country level with mgcv/gamm4

[R] Meta-analysis of prevalence at the country level with mgcv/gamm4