James Ewing
2011-Feb-03 21:17 UTC
[R] Generalized Estimating Equations and a Reviewer Question I can’t answer
I am using the R statistical package in a pretty straightforward manner, just interested in getting summary statistics and reliable estimates of means, variances, confidence intervals, p-values using standard tests, etc. I've been using the generalized estimating equation package because I often deal with repeatedly sampled pairs of data from the same source, and my reading (and my statistician) tells me that GEE is an appropriate method for that type of data set. I’m using the geeglm program from the geepack download. My data is correlated data from two estimates of vascular permeability in a rat model of cerebral tumor. One estimate, the MRI estimate, can be conducted non-invasively and repeatedly. The other estimate, the QAR estimate, can be done just once since it involves sacrificing the animal. The MRI estimate needs to be characterized as to its correlation with the QAR estimate, and that is what the paper I have in review does. The question is whether, and how strongly, the two estimates are correlated. The estimates are presented in the form of image maps of vascular permeability – call them estimate QAR (the gold standard) and estimate MRI (the proposed surrogate measure). I align the two image maps of the parametric estimates as best I can, draw a region-of interest around the cerebral tumor, and get anywhere from 200 to 400 pairs of estimates per region of interest in one animal. I have 15 studies. The data looks like this: ID QAR MRI EJ4 0.001341 0.001941 EJ4 0.00102 0.001605 EJ4 0.000673 0.001767 EJ4 0.0005 0.000143 EJ4 0.000393 -0.000962 EJ4 0.002681 0.00117 EJ4 0.002529 0.001405 EJ4 0.002097 0.001295 EJ4 0.001448 0.001463 EJ4 0.001101 0.000877 EJ4 0.000834 0.000719 EJ4 0.000567 0.001902 . . . EJ5 0.001084 0.00153 EJ5 0.001437 0.001332 EJ5 0.003087 0.001737 EJ5 0.003479 0.001634 EJ5 0.003541 0.002122 EJ5 0.003355 0.002099 EJ5 0.003149 0.00211 EJ5 0.002881 0.001591 EJ5 0.001469 0.001559 EJ5 0.001191 0.002259 EJ5 0.000518 0.001589 EJ5 0.000689 0.001348 . . . etcetera. I used this command to generate an estimate of the correlation coefficient: ll_gee_2 <- geeglm(MRI~QAR, id = ID, family = gaussian, corstr ="exch") This yielded a very significant but moderate correlation between the two methods, in general agreement with the results of a linear regression done on the means of the values of the two estimates in their ROIs. Now, the reviewer had this query: “Statistical evaluation: How many parameters were used for the GEE evaluation? Was only the bias changed for each tumor. The correlation of a single tumor should be presented. How much did the bias change?” In second review, after we explained that we didn’t think there were biases in the analysis, this came back: “The exact fitting approach using the GEE model is difficult to understand. The authors should clearly state, that they use a different bias for each individual measurement using the GEE model.” I can’t figure out what the reviewer is talking about. According to my reading of the Liang and Zeger original paper on the Generalized Estimating Equations, there aren’t any biases (as I understand bias) generated in this method. I’m guessing that the reviewer is having trouble with language, and is really referring to the weights assigned to the variances of each of the 15 clusters of data. And, after this lengthy preamble – here’s my question: How do I find the weights assigned to each of the 15 clusters in the GEE analysis? If this is in summary(), which field would it be? Indeed, how do I tell what fields are contained in summary()? Clearly, there are fields that aren't reported in the simple invocation of summary(ll_gee_2). The listserver comments reveal that summary() produces a list containing a data frame, but I'm still not clear on how to identify which element in the data frame would be the weights I'm after. I did use str(ll_gee_2), and got a listing that (apparently) showed the weights to be all 1's, as far as I can determine. Sorry about the length of this query, and about my ignorance of R functions. Also, I'm aware that I'm actually asking the experts to address two questions: 1) How to print out fields produced by gee that aren't routinely reported in summary(), and 2) How to deal with a reviewer who is asking a badly-worded question that he apparently thinks is important. I hope problem 2 evokes a sympathetic response from those who have had similar problems. I’ve googled “R summary() fields,” and variations of that query, but have yet to find a way to list out the fields in the summary() function. - Jim Ewing [[alternative HTML version deleted]]