Emmanuel Charpentier
2001-Dec-05 12:47 UTC
[R] (Meta-analysis) How to build|fake a [n]lm[e] object ?
Dear all, I recently had to review the current litterature about some medical treatment with two possible variants (let's call them A and B). I collected all available prospective randomized trials about this treatment : I got four trials for the A variant and three for the B variant, all studies comparing one variant to a "suitably choosen" placebo. Two classes of variables are of interest here : a) the net effect of the treatment, which is assessed by some (set of) numerical values, with distributions not too far from the normal ; b) the side effects of the treatment, assessed by the number of occurences of (a set of) undesirable events. The papers report : a) for the numerical variables : sample size, mean and SD (or SE, which allows to recompute SD) of each group, plus some test statistic (usually Student's T) ; b) for events : the sample size and number of events in each group, plus some test statistic (usually chi-square, sometimes incorrectly used : the continuity correction is often forgotten, an the exact Fisher test is almost unheard of ...). It made medical sense to consider the "variant" factor ancillary to the treatment factor (that is, to *postulate* that the difference in treatment effects between variant is much smaller that the treatment effect itself); therefore, it is not a big problem to exclude it in the analysis. So I used the rmeta package to assess the treatment effects. The results, as far as I can tell, are not unreasonable. However, I have two problems with this approach : A) Assessing the "variant" effect : how ? ======================================== My main problem is that I can't assess formally the (quite possibly null) effect of the "variant" factor (i. e. checking, at least a posteriori, that the "variant" effect is indeed much smaller that the treatment effect). In other words, if I had had the trials' raw data, what I would have used would have been, for numerical variables, something along the lines of : meta.lme<-lme(Variable~Treatment*Variant/Trial, data=xxx, random=~1|Trial) for a "random trial effect" (? la Der Simonian), and meta.lm<-lm(Variable~Treatment*Variant/Trial, data=xxx) for a "fixed trial effect" model, "treatment" and "variant" being of course fixed effects of interest, the Treatment*Variant interaction being the variable of interest for the verification of the homogeneity of treatment effect between variants. (In my case, the trials are somewhat heterogenous (due tio not having the same inclusion criteria), therefore the "random effect" model makes more sense). However, I do *not* have the raw data. Of course, I can trivially rebuild the "sum-of-data" and "sum-of-squares" in each "cell" of the potential "experimental plan". But I'm not able to analyse this. I looked in old books (some dating back from the '50s, wher computers were not readily available for biostatistics) and saw that all algorithms used back then supposed a *balanced* experimental plan. Some approximations were used (such as using the harmonic means of sample sizes to compute the expectations of "between-rows", "between-columns", "between-cells" and "within-cells" variances under the null hypothesis, but those approximations can only be used for *mild* unbalances. In my case, this won't do : Per-group sample size varies between 10 and 244, and there is always some unbalance between treatment groups (mainly due to stratification effects). That's *not* "mild" ... I tried to follow Winer's explanation of what he calls "least-squares estimation" (that's what all modern ANOVA software, including lm and friends, do) to see if I could build an algorithm from this ... and got lost (I'm pretty bad at linear algebra). However, it appears that a lm object contains just the kind of data one can extract from a pile of papers : one can build such an object with each group of each paper a line, with a "residual" computed from the published SD, a "value" computed from the published mean and a "weight" computed from te sample size. Given that drop, anova and related functions do not have to re-fit the model to assess effects, one could then analyse this artificially-reconstructed lm object. Hence my questions : a) Am I totally wrong ? b) If not, how would you build such an object ? c) What cautions should be used in interpreting the results ? d) Would this approach work with a lme object ? with a (suitably built) nlme object (in order to assess "variant" effect on event data) ? e) Would such an approach allow to assess treatment effects for trials with more than 2 groups (e. g. placebo vs. drug vs. surgery) ? B) Alternatives to the odds-ration for event data ? ================================================== The usual way to assess effects for categorical variables is to compute the log(odds-ratio) for each study and to pool them using inverse variance as weights (that's what meta.DSL and meta.MH do, respectively for random and fixed effect model). However, in some trials, some event have a frequency of zero in one or both groups. In the first case, one can neglect the said trial for the assessment of the treatment effect, on the basis that it is not informative. In the second case, however, the data cannot be used (because the OR is either zero or infinite, with infine asymptotic variance). The treatment assessment by OR pooling dismisses these trials (see meta.DSL source, for example ; and this is also the case in other meta-analysis packages, such as Cochrane's RevMan). But the asymetry (some events in one group and none in the other) is indeed an information, and I do not feel at ease with discarding it. The best I can think of is the ordinary test of independance (Fisher's test, in this case) on a contingency table "summing" the individual trials' contingency tables. This analysis confirms the results iof the meta-analysis. But it does not account for trials' heterogeneity, which is a large part of the point of a meta-analysis. Someone suggested to me to add a "small" quantity (say 1, or 0.5, as in the case of Yate's correction for continuity) to the event counts in these groups, ant to see if the inclusion of these study would entail a modification of the results, but I'm "isntinctively" not satisfied with this approach. In my case, the meta-analysis exhibits an excess of some undesirable events in one of the treatment groups, while this excess does not reach the sacro-sanctus "statistical significance threshold" in any of the papers I analysed (physicians are sometimes bloody p-value worshippers ...). Therefore, I'd like to be damn sure to *correctly* use *all* available information. Any suggestions or pointers to litterature ? Sincerely yours, Emmanuel Charpentier -- Emmanuel Charpentier Tel : +33-01 40 27 35 98 Secr?tariat scientifique du CEDIT Fax : +33-01 40 27 55 65 Direction de la Politique M?dicale // Assistance Publique - H?pitaux de Paris 3, Avenue Victoria // F-75004 Paris /// France -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley
2001-Dec-05 17:11 UTC
[R] (Meta-analysis) How to build|fake a [n]lm[e] object ?
On Wed, 5 Dec 2001, Emmanuel Charpentier wrote:> > B) Alternatives to the odds-ration for event data ? > ==================================================> > The usual way to assess effects for categorical variables is to compute the > log(odds-ratio) for each study and to pool them using inverse variance as > weights (that's what meta.DSL and meta.MH do, respectively for random and fixed > effect model). > > However, in some trials, some event have a frequency of zero in one or both > groups. In the first case, one can neglect the said trial for the assessment of > the treatment effect, on the basis that it is not informative. In the second > case, however, the data cannot be used (because the OR is either zero or > infinite, with infine asymptotic variance). The treatment assessment by OR > pooling dismisses these trials (see meta.DSL source, for example ; and this is > also the case in other meta-analysis packages, such as Cochrane's RevMan).meta.MH doesn't have this problem -- it's quite happy with zero cells.> But the asymetry (some events in one group and none in the other) is indeed an > information, and I do not feel at ease with discarding it. The best I can think > of is the ordinary test of independance (Fisher's test, in this case) on a > contingency table "summing" the individual trials' contingency tables. This > analysis confirms the results iof the meta-analysis. But it does not account > for trials' heterogeneity, which is a large part of the point of a > meta-analysis.Either meta.MH or conditional logistic regression (clogit in the survival package) would fix this> Someone suggested to me to add a "small" quantity (say 1, or 0.5, as in the > case of Yate's correction for continuity) to the event counts in these groups, > ant to see if the inclusion of these study would entail a modification of the > results, but I'm "isntinctively" not satisfied with this approach. >If you want a fixed effect of treatment there's no problem (and I personally don't like meta-analyses where a random-effects model makes a difference) If you need a random effects model that doesn't object to zero cells then lme() and variants aren't going to work, and you need a real generalized linear mixed model with random intercept and random treatment effect. Logistic mixed models are a hard problem. Jim Lindsey's 'repeated' package may handle this, though. A little simulation would tell you what the properties of the `continuity correction' approach are. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
friendly@hotspur.psych.yorku.ca
2001-Dec-05 17:38 UTC
[R] (Meta-analysis) How to build|fake a [n]lm[e] object ?
Emmanuel- Perhaps I can help with one thing: ! However, I do *not* have the raw data. Of course, I can trivially rebuild the ! "sum-of-data" and "sum-of-squares" in each "cell" of the potential ! "experimental plan". But I'm not able to analyse this. I looked in old books ! (some dating back from the '50s, wher computers were not readily available for ! biostatistics) and saw that all algorithms used back then supposed a *balanced* There is a simple solution to the problem of going from summary statistics to an lm() analysis which gives equivalent results, described by Larsen, and implemented by me as a SAS macro, stat2dat. The freq= variable would become the weight= in lm(). /*= name: STAT2DAT title: Transform a summary data set to pseudo-observations Doc: http://www.math.yorku.ca/SCS/sasmac/stat2dat.html Version: 1.1 Revised: 2 Apr 1999 =Description: Take a dataset containing summary statistics (N, mean, std dev) for a between groups design and produce a dataset from which PROC GLM can be run to produce equivalent results. =Usage: %stat2dat(data=inputdataset, out=outputdataset, ..., depvar=Y, freq=freq) The input dataset contains one observation for each group. Supply the names of variables containing the N, MEAN, and standard deviation (STD) for each group (see argument list below); The mean square error (MSE) for a reported ANOVA can be supplied instead of individual STD values. The sample size per cell can be supplied as a constant rather than a dataset variable if all groups are of the same size. The output dataset can then be used with PROC GLM or PROC ANOVA (balanced designs). It contains all variables from the input dataset plus a constructed dependent variable ('Y' by default) and a constructed frequency variable ('freq' by default). proc glm data=outputdataset; class classvars; freq freq; model Y = modelterms; Based on: David Larsen, Analysis of Variance With Just Summary Statistics as Input, The American Statistician, May 1992, Vol. 46(2), 151-152. (David Larson: dalef at uno.edu) Michael Friendly <friendly at yorku.ca> Psychology Department, York University Toronto, ONT M3J 1P3 CANADA =*/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._