Gordon K Smyth
2011-Mar-16  22:29 UTC
[Rd] Standardized Pearson residuals (and score tests)
Hi Peter and others, If it helps, I wrote a small function glm.scoretest() for the statmod package on CRAN to compute score tests from glm fits. The score test for adding a covariate, or any set of covariates, can be extracted very neatly from the standard glm output, although you probably already know that. Regards Gordon --------------------------------------------- Professor Gordon K Smyth, NHMRC Senior Research Fellow, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth> Date: Tue, 15 Mar 2011 12:17:46 +0100 > From: peter dalgaard <pdalgd at gmail.com> > To: Brett Presnell <presnell at stat.ufl.edu> > Cc: r-devel at r-project.org > Subject: Re: [Rd] Standardized Pearson residuals > > > On Mar 15, 2011, at 04:40 , Brett Presnell wrote: > >>>> Background: I'm currently teaching an undergrad/grad-service course >>>> from Agresti's "Introduction to Categorical Data Analysis (2nd edn)" >>>> and deviance residuals are not used in the text. For now I'll just >>>> provide the students with a simple function to use, but I prefer to >>>> use R's native capabilities whenever possible. >>> >>> Incidentally, chisq.test will have a stdres component in 2.13.0 for >>> much the same reason. >> >> Thank you. That's one more thing I won't have to provide code for >> anymore. Coincidentally, Agresti mentioned this to me a week or two >> ago as something that he felt was missing, so that's at least two >> people who will be happy to see this added. >> > > And of course, I was teaching a course based on Agresti & Franklin: > "Statistics, The Art and Science of Learning from Data", when I realized > that R was missing standardized residuals. > > >> It would also be nice for teaching purposes if glm or summary.glm had a >> "pearsonchisq" component and a corresponding extractor function, but I >> can imagine that there might be arguments against it that haven't >> occured to me. Plus, I doubt that anyone wants to touch glm unless >> it's to repair a bug. If I'm wrong about all that though, ... >> > Hmm, how would that work? If there was one, I'd worry that people would > start subtracting them which is usually not the right thing to do. I do > miss having a test on the residual deviance occasionally (even though it > is only sometimes meaningful), having to fit a saturated model > explicitly can be a bit silly. E.g. in this case (homogeneity of birth > rates): > >> anova(glm(births~month,poisson,data=bb), test="Chisq") > ... > Df Deviance Resid. Df Resid. Dev P(>|Chi|) > NULL 11 225.98 > month 11 225.98 0 0.00 < 2.2e-16 *** >> anova(glm(births~1,poisson,data=bb), test="Chisq") > ... > Df Deviance Resid. Df Resid. Dev P(>|Chi|) > NULL 11 225.98 > > Notice that the latter version gives me the correct deviance but no > p-value. > > > A better support for generic score tests could be desirable too. I > suspect that this would actually be the Pearson Chi-square in the > interesting cases. > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
peter dalgaard
2011-Mar-17  12:58 UTC
[Rd] Standardized Pearson residuals (and score tests)
On Mar 16, 2011, at 23:29 , Gordon K Smyth wrote:> Hi Peter and others, > > If it helps, I wrote a small function glm.scoretest() for the statmod package on CRAN to compute score tests from glm fits. The score test for adding a covariate, or any set of covariates, can be extracted very neatly from the standard glm output, although you probably already know that.Thanks Gordon, I'll have a look. It's the kind of think where you _strongly suspect_ that a neat solution exists, but where you can't just write it down immediately. Looks like your code needs some elaboration to handle factor terms and more general model reductions, though. -pd> > Regards > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > NHMRC Senior Research Fellow, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > smyth at wehi.edu.au > http://www.wehi.edu.au > http://www.statsci.org/smyth > >> Date: Tue, 15 Mar 2011 12:17:46 +0100 >> From: peter dalgaard <pdalgd at gmail.com> >> To: Brett Presnell <presnell at stat.ufl.edu> >> Cc: r-devel at r-project.org >> Subject: Re: [Rd] Standardized Pearson residuals >> >> >> On Mar 15, 2011, at 04:40 , Brett Presnell wrote: >> >>>>> Background: I'm currently teaching an undergrad/grad-service course from Agresti's "Introduction to Categorical Data Analysis (2nd edn)" and deviance residuals are not used in the text. For now I'll just provide the students with a simple function to use, but I prefer to use R's native capabilities whenever possible. >>>> >>>> Incidentally, chisq.test will have a stdres component in 2.13.0 for much the same reason. >>> >>> Thank you. That's one more thing I won't have to provide code for anymore. Coincidentally, Agresti mentioned this to me a week or two ago as something that he felt was missing, so that's at least two people who will be happy to see this added. >>> >> >> And of course, I was teaching a course based on Agresti & Franklin: "Statistics, The Art and Science of Learning from Data", when I realized that R was missing standardized residuals. >> >> >>> It would also be nice for teaching purposes if glm or summary.glm had a "pearsonchisq" component and a corresponding extractor function, but I can imagine that there might be arguments against it that haven't occured to me. Plus, I doubt that anyone wants to touch glm unless it's to repair a bug. If I'm wrong about all that though, ... >>> >> Hmm, how would that work? If there was one, I'd worry that people would start subtracting them which is usually not the right thing to do. I do miss having a test on the residual deviance occasionally (even though it is only sometimes meaningful), having to fit a saturated model explicitly can be a bit silly. E.g. in this case (homogeneity of birth rates): >> >>> anova(glm(births~month,poisson,data=bb), test="Chisq") >> ... >> Df Deviance Resid. Df Resid. Dev P(>|Chi|) >> NULL 11 225.98 >> month 11 225.98 0 0.00 < 2.2e-16 *** >>> anova(glm(births~1,poisson,data=bb), test="Chisq") >> ... >> Df Deviance Resid. Df Resid. Dev P(>|Chi|) >> NULL 11 225.98 >> >> Notice that the latter version gives me the correct deviance but no p-value. >> >> >> A better support for generic score tests could be desirable too. I suspect that this would actually be the Pearson Chi-square in the interesting cases. >> >> -- >> Peter Dalgaard >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > ______________________________________________________________________ > The information in this email is confidential and intended solely for the addressee. > You must not disclose, forward, print or use it without the permission of the sender. > ______________________________________________________________________-- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com