thr3ads.net - R help - [R] Post-hoc tests in MASS using glm.nb [May 2011]

If this information is useful, please help other people find it:
Share via:

bryony

2011-May-16 17:46 UTC

[R] Post-hoc tests in MASS using glm.nb

I am struggling to generate p values for comparisons of levels (post-hoc
tests) in a glm with a negative binomial distribution

I am trying to compare cell counts on different days as grown on different
media (e.g. types of cryogel) so I have 2 explanatory variables (Day and
Cryogel), which are both factors, and an over-dispersed count variable
(number of cells) as the response. I know that both variables are
significant, and that there is a significant interaction between them.
However, I seem unable to generate multiple comparisons between the days and
cryogels. 

So my model is 

Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel)

The output gives me comparisons between levels of the factors relative to a
reference level (Day 0 and Cryogel 1) as follows:

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)      1.2040     0.2743   4.389 1.14e-05 ***
Day14            3.3226     0.3440   9.658  < 2e-16 ***
Day28            3.3546     0.3440   9.752  < 2e-16 ***
Day7             3.3638     0.3440   9.779  < 2e-16 ***
Cryogel2         0.7097     0.3655   1.942  0.05215 .  
Cryogel3         0.7259     0.3651   1.988  0.04677 *  
Cryogel4         1.4191     0.3539   4.010 6.07e-05 ***
Day14:Cryogel2  -0.7910     0.4689  -1.687  0.09162 .  
Day28:Cryogel2  -0.5272     0.4685  -1.125  0.26053    
Day7:Cryogel2   -1.1794     0.4694  -2.512  0.01199 *  
Day14:Cryogel3  -1.0833     0.4691  -2.309  0.02092 *  
Day28:Cryogel3   0.1735     0.4733   0.367  0.71395    
Day7:Cryogel3   -1.0907     0.4690  -2.326  0.02003 *  
Day14:Cryogel4  -1.2834     0.4655  -2.757  0.00583 ** 
Day28:Cryogel4  -0.6300     0.4591  -1.372  0.16997    
Day7:Cryogel4   -1.3436     0.4596  -2.923  0.00347 ** 


HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc
on each of the days. I realise that such multiple comparsions need to be
approached with care to avoid Type 1 error, however it is easy to do this in
other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to
be difficult in R. I have tried the glht (multcomp) function but it gives me
the same results. I assume that there is some way of entering the data
differently so as to tell R to use a different reference level each time and
re-run the analysis for each level, but don't know what this is.
Please help!

Many thanks for your input

Bryony

--
View this message in context:
http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
Sent from the R help mailing list archive at Nabble.com.

Bill.Venables at csiro.au

2011-May-17 02:20 UTC

head link

[R] Post-hoc tests in MASS using glm.nb

?relevel

Also, you might want to fit the models as follows

Model1 <- glm.nb(Cells ~ Cryogel*Day, data = myData)

myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref =
"2"))
Model2 <- update(Model1, data = myData1) 

&c

You should always spedify the data set when you fit a model if at all possible. 
I would recommend you NEVER use attach() to put it on the search path, (under
all but the most exceptional circumstances).

You could fit your model as 

Model0 <- glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData)

This will give you the subclass means as the regression coefficients.  You can
then use vcov(Model0) to get the variance matrix and compare any two you like
using directly calculated t-statistics.  This is pretty straightforward as well.

Bill Venables.


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of bryony
Sent: Tuesday, 17 May 2011 3:46 AM
To: r-help at r-project.org
Subject: [R] Post-hoc tests in MASS using glm.nb

I am struggling to generate p values for comparisons of levels (post-hoc
tests) in a glm with a negative binomial distribution

I am trying to compare cell counts on different days as grown on different
media (e.g. types of cryogel) so I have 2 explanatory variables (Day and
Cryogel), which are both factors, and an over-dispersed count variable
(number of cells) as the response. I know that both variables are
significant, and that there is a significant interaction between them.
However, I seem unable to generate multiple comparisons between the days and
cryogels. 

So my model is 

Model1<-glm.nb(Cells~Cryogel+Day+Day:Cryogel)

The output gives me comparisons between levels of the factors relative to a
reference level (Day 0 and Cryogel 1) as follows:

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)      1.2040     0.2743   4.389 1.14e-05 ***
Day14            3.3226     0.3440   9.658  < 2e-16 ***
Day28            3.3546     0.3440   9.752  < 2e-16 ***
Day7             3.3638     0.3440   9.779  < 2e-16 ***
Cryogel2         0.7097     0.3655   1.942  0.05215 .  
Cryogel3         0.7259     0.3651   1.988  0.04677 *  
Cryogel4         1.4191     0.3539   4.010 6.07e-05 ***
Day14:Cryogel2  -0.7910     0.4689  -1.687  0.09162 .  
Day28:Cryogel2  -0.5272     0.4685  -1.125  0.26053    
Day7:Cryogel2   -1.1794     0.4694  -2.512  0.01199 *  
Day14:Cryogel3  -1.0833     0.4691  -2.309  0.02092 *  
Day28:Cryogel3   0.1735     0.4733   0.367  0.71395    
Day7:Cryogel3   -1.0907     0.4690  -2.326  0.02003 *  
Day14:Cryogel4  -1.2834     0.4655  -2.757  0.00583 ** 
Day28:Cryogel4  -0.6300     0.4591  -1.372  0.16997    
Day7:Cryogel4   -1.3436     0.4596  -2.923  0.00347 ** 


HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc
on each of the days. I realise that such multiple comparsions need to be
approached with care to avoid Type 1 error, however it is easy to do this in
other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to
be difficult in R. I have tried the glht (multcomp) function but it gives me
the same results. I assume that there is some way of entering the data
differently so as to tell R to use a different reference level each time and
re-run the analysis for each level, but don't know what this is.
Please help!

Many thanks for your input

Bryony

--
View this message in context:
http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bill.Venables at csiro.au

2011-May-17 23:15 UTC

head link

[R] Post-hoc tests in MASS using glm.nb

PS I should have followed the example with one using with() for something that
would often be done with attach():  Consider:

with(polyData, {
  plot(x, y, pch=".")
  o <- order(x)
  lines(x[o], eta[o], col = "red")
})

I use this kind of dodge a lot, too, but now you can mostly use data= arguments
on the individual functions.

Bill Venables.

-----Original Message-----
From: Venables, Bill (CMIS, Dutton Park) 
Sent: Wednesday, 18 May 2011 9:07 AM
To: 'Bert Gunter'; 'Peter Ehlers'
Cc: 'R list'
Subject: RE: [R] Post-hoc tests in MASS using glm.nb

Amen to all of that, Bert.  Nicely put.  The google style guide (not perfect,
but a thoughtful contribution on these kinds of issues, has avoiding attach() as
its very first line.  See
http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)

I would add, though, that not enough people seem yet to be aware of within(...),
a companion of with(...) in a way, but used for modifying data frames or other
kinds of list objects.  It should be seen as a more flexible replacement for
transform() (well, almost).

The difference between with() and within() is as follows:

with(data, expr, ...) 

allows you to evaluate 'expr' with 'data' providing the primary
source for variables, and returns *the evaluated expression* as the result.  By
contrast

within(data, expr, ...) 

again uses 'data' as the primary source for variables when evaluating
'expr', but now 'expr' is used to modify the varibles in
'data' and returns *the modified data set* as the result.

I use this a lot in the data preparation phase of a project, especially, which
is usually the longest, trickiest, most important, but least discussed aspect of
any data analysis project.

Here is a simple example using within() for something you cannot do in one step
with transform():

polyData <- within(data.frame(x = runif(500)), {
  x2 <- x^2
  x3 <- x*x2   
  b <- runif(4)
  eta <- cbind(1,x,x2,x3) %*% b   
  y <- eta + rnorm(x, sd = 0.5)  
  rm(b)
})

check:
> str(polyData)'data.frame':	500 obs. of  5 variables:
 $ x  : num  0.5185 0.185 0.5566 0.2467 0.0178 ...
 $ y  : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ...
 $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ...
 $ x3 : num  1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ...
 $ x2 : num  0.268811 0.034224 0.309802 0.060844 0.000315
...> 
Bill Venables.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Bert Gunter
Sent: Wednesday, 18 May 2011 12:08 AM
To: Peter Ehlers
Cc: R list
Subject: Re: [R] Post-hoc tests in MASS using glm.nb

Folks:
> Only if the user hasn't yet been introduced to the with() function,
> which is linked to on the ?attach page.
>
> Note also this sentence from the ?attach page:
> ?".... attach can lead to confusion."
>
> I can't remember the last time I needed attach().
>
> Peter Ehlers
Yes. But perhaps it might be useful to flesh this out with a bit of
commentary. To this end, I invite others to correct or clarify the
following.

The potential "confusion" comes from requiring R to search for the
data. There is a rigorous process by which this is done, of course,
but it requires that the runtime environment be consistent with that
process, and the programmer who wrote the code may not have control
over that environment. The usual example is that one has an object
named,say,  "a" in the formula and in the attached data and another
"a" also in the global environment. Then the wrong "a" would
be found.
The same thing can happen if another data set gets attached in a
position before the one of interest. (Like Peter, I haven't used
attach() in so long that I don't know whether any warning messages are
issued in such cases).

Using the "data = " argument when available or the with() function
when not avoids this potential confusion and tightly couples the data
to be analyzed with the analysis.

I hope this clarifies the previous posters' comments.

Cheers,
Bert
>
> [... non-germane material snipped ...]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Reshaping Data

R help - May 2011 - Post-hoc tests in MASS using glm.nb

[R] Post-hoc tests in MASS using glm.nb

[R] Post-hoc tests in MASS using glm.nb

[R] Post-hoc tests in MASS using glm.nb

Possibly Parallel Threads