I have used the insurance data from R library and I have 2 questions: I use the following:>library(MASS) >data(Insurance) > m1=glm(Claims ~ District + Group + Age + offset(log(Holders)),data Insurance, family = poisson) >summary(m1)Call: glm(formula = Claims ~ District + Group + Age + offset(log(Holders)), family = poisson, data = Insurance) Deviance Residuals: Min 1Q Median 3Q Max -2.46558 -0.50802 -0.03198 0.55555 1.94026 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.810508 0.032972 -54.910 < 2e-16 *** District2 0.025868 0.043016 0.601 0.547597 District3 0.038524 0.050512 0.763 0.445657 District4 0.234205 0.061673 3.798 0.000146 *** Group.L 0.429708 0.049459 8.688 < 2e-16 *** Group.Q 0.004632 0.041988 0.110 0.912150 Group.C -0.029294 0.033069 -0.886 0.375696 Age.L -0.394432 0.049404 -7.984 1.42e-15 *** Age.Q -0.000355 0.048918 -0.007 0.994210 Age.C -0.016737 0.048478 -0.345 0.729910 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 236.26 on 63 degrees of freedom Residual deviance: 51.42 on 54 degrees of freedom AIC: 388.74 (1) In the result above, what is Group.L, Group.Q, Group.C, Age.L, Age.Q, Age.C ? (2) When I copy the Insurance data in csv format (as shown in the attachement) and run the same procedure the result shown is different from above result, why ?
In the Insurance dataset both Age and Group are ordered factors so the default encoding for them is orthogonal polynomials (assuming that the user has not changed the default). In the output below the .L indicates that line is for the "Linear" piece of the encoding or the Linear contrast on the groups, .Q is for the "Quadratic" piece/contrast and .C is for "Cubic". If you don't understand what is meant by linear/quadratic/cubic, then do some background reading on orthogonal polynomials. If you read the data in yourself from a .csv file, then Age and Group will not be ordered factors unless you specifically convert them to be. Therefore the default encoding will be something other than orthogonal polynomials and the specific details will be different (though the overall effect will be the same). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of choonhong ang > Sent: Monday, February 23, 2009 10:05 AM > To: r-help at r-project.org > Subject: [R] Insurance data in library(MASS) > > I have used the insurance data from R library and I have 2 questions: > I use the following: > >library(MASS) > >data(Insurance) > > m1=glm(Claims ~ District + Group + Age + offset(log(Holders)),data > Insurance, family = poisson) > >summary(m1) > > Call: > glm(formula = Claims ~ District + Group + Age + offset(log(Holders)), > family = poisson, data = Insurance) > Deviance Residuals: > Min 1Q Median 3Q Max > -2.46558 -0.50802 -0.03198 0.55555 1.94026 > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.810508 0.032972 -54.910 < 2e-16 *** > District2 0.025868 0.043016 0.601 0.547597 > District3 0.038524 0.050512 0.763 0.445657 > District4 0.234205 0.061673 3.798 0.000146 *** > Group.L 0.429708 0.049459 8.688 < 2e-16 *** > Group.Q 0.004632 0.041988 0.110 0.912150 > Group.C -0.029294 0.033069 -0.886 0.375696 > Age.L -0.394432 0.049404 -7.984 1.42e-15 *** > Age.Q -0.000355 0.048918 -0.007 0.994210 > Age.C -0.016737 0.048478 -0.345 0.729910 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > (Dispersion parameter for poisson family taken to be 1) > Null deviance: 236.26 on 63 degrees of freedom > Residual deviance: 51.42 on 54 degrees of freedom > AIC: 388.74 > (1) In the result above, what is Group.L, Group.Q, Group.C, Age.L, > Age.Q, > Age.C ? > > (2) When I copy the Insurance data in csv format (as shown in the > attachement) and run the same procedure the result shown is different > from > above result, why ?
You are asking about support software for a book, and the book contains the answers .... And it should be given due credit. On Mon, 23 Feb 2009, choonhong ang wrote:> I have used the insurance data from R library and I have 2 questions: > I use the following: >> library(MASS) >> data(Insurance) >> m1=glm(Claims ~ District + Group + Age + offset(log(Holders)),data > Insurance, family = poisson) >> summary(m1) > > Call: > glm(formula = Claims ~ District + Group + Age + offset(log(Holders)), > family = poisson, data = Insurance) > Deviance Residuals: > Min 1Q Median 3Q Max > -2.46558 -0.50802 -0.03198 0.55555 1.94026 > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.810508 0.032972 -54.910 < 2e-16 *** > District2 0.025868 0.043016 0.601 0.547597 > District3 0.038524 0.050512 0.763 0.445657 > District4 0.234205 0.061673 3.798 0.000146 *** > Group.L 0.429708 0.049459 8.688 < 2e-16 *** > Group.Q 0.004632 0.041988 0.110 0.912150 > Group.C -0.029294 0.033069 -0.886 0.375696 > Age.L -0.394432 0.049404 -7.984 1.42e-15 *** > Age.Q -0.000355 0.048918 -0.007 0.994210 > Age.C -0.016737 0.048478 -0.345 0.729910 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > (Dispersion parameter for poisson family taken to be 1) > Null deviance: 236.26 on 63 degrees of freedom > Residual deviance: 51.42 on 54 degrees of freedom > AIC: 388.74 > (1) In the result above, what is Group.L, Group.Q, Group.C, Age.L, Age.Q, > Age.C ?See the book ca p.146.> (2) When I copy the Insurance data in csv format (as shown in the > attachement) and run the same procedure the result shown is different from > above result, why ?Who knows?: you did not deign to tell us what you did with the CSV file nor the results you got. Most likely you did not get the factor levels and classes the same as the help file destribes. Hint: Group and Age are ordered factors. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi, In the result shown, the District 1 is used as the base category. How to change to make District 4 as a base category ? On Mon, Feb 23, 2009 at 11:05 AM, choonhong ang <angie.bearman@gmail.com>wrote:> I have used the insurance data from R library and I have 2 questions: > I use the following: > >library(MASS) > >data(Insurance) > > m1=glm(Claims ~ District + Group + Age + offset(log(Holders)),data > Insurance, family = poisson) > >summary(m1) > > Call: > glm(formula = Claims ~ District + Group + Age + offset(log(Holders)), > family = poisson, data = Insurance) > Deviance Residuals: > Min 1Q Median 3Q Max > -2.46558 -0.50802 -0.03198 0.55555 1.94026 > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.810508 0.032972 -54.910 < 2e-16 *** > District2 0.025868 0.043016 0.601 0.547597 > District3 0.038524 0.050512 0.763 0.445657 > District4 0.234205 0.061673 3.798 0.000146 *** > Group.L 0.429708 0.049459 8.688 < 2e-16 *** > Group.Q 0.004632 0.041988 0.110 0.912150 > Group.C -0.029294 0.033069 -0.886 0.375696 > Age.L -0.394432 0.049404 -7.984 1.42e-15 *** > Age.Q -0.000355 0.048918 -0.007 0.994210 > Age.C -0.016737 0.048478 -0.345 0.729910 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > (Dispersion parameter for poisson family taken to be 1) > Null deviance: 236.26 on 63 degrees of freedom > Residual deviance: 51.42 on 54 degrees of freedom > AIC: 388.74 > (1) In the result above, what is Group.L, Group.Q, Group.C, Age.L, Age.Q, > Age.C ? > > (2) When I copy the Insurance data in csv format (as shown in the > attachement) and run the same procedure the result shown is different from > above result, why ? >[[alternative HTML version deleted]]