Hi all I have another GLIM question. I have been using R as well as Genstat (version 6) in order to fit GLIM models to the data (displayed below). The same models are fitted but the answers supplied by the two packages are not the same. Why? Can anyone help? A discription of the data and the type of model/s fitted can be found below. Regards Allan The problem is taken from Bennet (1978) (I dont have any more of the reference.) In this example we wish to model the probability of a car insurance policyholder claiming insurance on his/her car given that we know certain information about him/her. The explanatory variables used in this analysis are: the age of the policyholder (age), the year of registration (reg) and a measure of the policyholders claim history called the no claim discount (ncd). Define p(i,j,k) as the probability of a policyholder in level i of age, level j of reg and level k of ncd making a claim (for i= 1 (age 17-22), 2 (23-26), 3 (27-65), 4 (66-80) ; j=1 (registration after 1964), 2 (63-64), 3 (60-62), 4 (earlier than 1960) and ncd= 1 (1-1 claims), 2 (2-3), 3 (more than three).). Similarly define r(i,j,k) as the proportion of policyholders in level i of age, level j of reg and level k of ncd making a claim. We can thus model r(i,j,k) by means of a binomial distribution with parameters p(i,j,k) and N(i,j,k) where N(i,j,k) is the total number of policyholders that falls into group i, j, k such that log(p(i,j,k)/(1-p(i,j,k))) = some function of age, reg and ncd . The baseline chosen is age (17-22), reg (65) and ncd (0-1). age reg ncd claims.r exp.n 1 (17-22) (65-) (0-1) 475 1800 2 (17-22) (65-) (2-3) 150 700 3 (17-22) (65-) (4+) 35 200 4 (17-22) (63-64) (0-1) 680 2650 5 (17-22) (63-64) (2-3) 215 1000 6 (17-22) (63-64) (4+) 55 250 7 (17-22) (60-62) (0-1) 710 2950 8 (17-22) (60-62) (2-3) 220 1100 9 (17-22) (60-62) (4+) 60 250 10 (17-22) (-59) (0-1) 230 1050 11 (17-22) (-59) (2-3) 75 450 12 (17-22) (-59) (4+) 25 250 13 (23-26) (65-) (0-1) 240 1300 14 (23-26) (65-) (2-3) 140 900 15 (23-26) (65-) (4+) 150 900 16 (23-26) (63-64) (0-1) 310 1500 17 (23-26) (63-64) (2-3) 185 1050 18 (23-26) (63-64) (4+) 170 1050 19 (23-26) (60-62) (0-1) 240 1405 20 (23-26) (60-62) (2-3) 160 1000 21 (23-26) (60-62) (4+) 130 1050 22 (23-26) (-59) (0-1) 80 600 23 (23-26) (-59) (2-3) 60 500 24 (23-26) (-59) (4+) 70 550 25 (27-65) (65-) (0-1) 1650 10300 26 (27-65) (65-) (2-3) 1450 9900 27 (27-65) (65-) (4+) 3400 28900 28 (27-65) (63-64) (0-1) 1550 9900 29 (27-65) (63-64) (2-3) 1450 10000 30 (27-65) (63-64) (4+) 3200 27700 31 (27-65) (60-62) (0-1) 1250 9300 32 (27-65) (60-62) (2-3) 1250 9200 33 (27-65) (60-62) (4+) 2500 25600 34 (27-65) (-59) (0-1) 500 4700 35 (27-65) (-59) (2-3) 550 5300 36 (27-65) (-59) (4+) 1400 18100 37 (66-80) (65-) (0-1) 55 275 38 (66-80) (65-) (2-3) 40 250 39 (66-80) (65-) (4+) 180 1400 40 (66-80) (63-64) (0-1) 35 225 41 (66-80) (63-64) (2-3) 30 225 42 (66-80) (63-64) (4+) 155 1450 43 (66-80) (60-62) (0-1) 25 200 44 (66-80) (60-62) (2-3) 40 300 45 (66-80) (60-62) (4+) 130 1500 46 (66-80) (-59) (0-1) 25 175 47 (66-80) (-59) (2-3) 30 300 48 (66-80) (-59) (4+) 180 2400 claims.r =the number of claims made in a particular group. exp.n=the total number of policyholders in a particular group. EXAMPLE As an example if we use age as an explanatory variable and fit the glm model we get the following results: cars<-read.table("c:/a.dat",header=T) attach(cars) y<-cbind(claims.r,exp.n) cars.age<-glm(y~age,family=binomial) summary(cars.age) Call: glm(formula = y ~ age, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -16.62952 -1.84073 -0.04282 2.07028 10.72502 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.46265 0.02050 -71.34 <2e-16 *** age(23-26) -0.34576 0.03197 -10.82 <2e-16 *** age(27-65) -0.66345 0.02182 -30.41 <2e-16 *** age(66-80) -0.77863 0.04020 -19.37 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1837.4 on 47 degrees of freedom Residual deviance: 883.8 on 44 degrees of freedom AIC: 1228.4 Number of Fisher Scoring iterations: 4 The Genstat output is displayed below. (well only some of it!!!) ***** Regression Analysis ***** Response variate: claims_r Binomial totals: exp_n Distribution: Binomial Link function: Logit Fitted terms: Constant, age *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 1298. 432.70 432.70 <.001 Residual 44 1136. 25.83 Total 47 2434. 51.80 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.1992 0.0211 -56.90 <.001 0.3014 age (23-26) -0.4302 0.0326 -13.20 <.001 0.6504 age (27-65) -0.7999 0.0224 -35.75 <.001 0.4494 age (66-80) -0.9297 0.0407 -22.86 <.001 0.3947 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level age (17-22)
allan clark <allan at stats.uct.ac.za> writes:> Hi all > > I have another GLIM question. > > I have been using R as well as Genstat (version 6) in order to fit > GLIM models to the data (displayed below). > > The same models are fitted but the answers supplied by the two > packages are not the same. > > Why? Can anyone help?It's not the same model! ....> cars<-read.table("c:/a.dat",header=T) > attach(cars) > y<-cbind(claims.r,exp.n)********************* from help(glm) For 'binomial' models the response can also be specified as a 'factor' (when the first level denotes failure and all others success) or as a two-column matrix with the columns giving the numbers of successes and failures.> cars.age<-glm(y~age,family=binomial) > summary(cars.age)....> ***** Regression Analysis ***** > > Response variate: claims_r > Binomial totals: exp_n******> Distribution: Binomial > Link function: Logit > Fitted terms: Constant, ageI.e. try y <- cbind(claims.r, exp.n - claims.r) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
In R you use cbind(successes, failures), not cbind(successes, total) as you appear to have done. And GLIM is a program (with I for interactive): these are GLMs, not GLIMs in th ecommonly accepted terminology (but not that of SAS). On Tue, 2 Dec 2003, allan clark wrote:> > Hi all > > I have another GLIM question. > > I have been using R as well as Genstat (version 6) in order to fit > GLIM models to the data (displayed below). > > The same models are fitted but the answers supplied by the two > packages are not the same. > > Why? Can anyone help? > > A discription of the data and the type of model/s fitted can be found > below. > > Regards > Allan > > > The problem is taken from Bennet (1978) (I dont have any more of the > reference.) > > In this example we wish to model the probability of a car insurance > policyholder claiming insurance on his/her car given that we know > certain information about him/her. The explanatory variables used in > this analysis are: the age of the policyholder (age), the year of > registration (reg) and a measure of the policyholders claim history > called the no claim discount (ncd). > > Define p(i,j,k) as the probability of a policyholder in level i of > age, level j of reg and level k of ncd making a claim (for i= 1 (age> 17-22), 2 (23-26), 3 (27-65), 4 (66-80) ; j=1 (registration after > 1964), 2 (63-64), 3 (60-62), 4 (earlier than 1960) and ncd= 1 (1-1 > claims), 2 (2-3), 3 (more than three).). > > Similarly define r(i,j,k) as the proportion of policyholders in level > i of age, level j of reg and level k of ncd making a claim. > > We can thus model r(i,j,k) by means of a binomial distribution with > parameters p(i,j,k) and N(i,j,k) where N(i,j,k) is the total number > of policyholders that falls into group i, j, k such that > log(p(i,j,k)/(1-p(i,j,k))) = some function of age, reg and ncd . > > The baseline chosen is age (17-22), reg (65) and ncd (0-1). > > > age reg ncd claims.r exp.n > 1 (17-22) (65-) (0-1) 475 1800 > 2 (17-22) (65-) (2-3) 150 700 > 3 (17-22) (65-) (4+) 35 200 > 4 (17-22) (63-64) (0-1) 680 2650 > 5 (17-22) (63-64) (2-3) 215 1000 > 6 (17-22) (63-64) (4+) 55 250 > 7 (17-22) (60-62) (0-1) 710 2950 > 8 (17-22) (60-62) (2-3) 220 1100 > 9 (17-22) (60-62) (4+) 60 250 > 10 (17-22) (-59) (0-1) 230 1050 > 11 (17-22) (-59) (2-3) 75 450 > 12 (17-22) (-59) (4+) 25 250 > 13 (23-26) (65-) (0-1) 240 1300 > 14 (23-26) (65-) (2-3) 140 900 > 15 (23-26) (65-) (4+) 150 900 > 16 (23-26) (63-64) (0-1) 310 1500 > 17 (23-26) (63-64) (2-3) 185 1050 > 18 (23-26) (63-64) (4+) 170 1050 > 19 (23-26) (60-62) (0-1) 240 1405 > 20 (23-26) (60-62) (2-3) 160 1000 > 21 (23-26) (60-62) (4+) 130 1050 > 22 (23-26) (-59) (0-1) 80 600 > 23 (23-26) (-59) (2-3) 60 500 > 24 (23-26) (-59) (4+) 70 550 > 25 (27-65) (65-) (0-1) 1650 10300 > 26 (27-65) (65-) (2-3) 1450 9900 > 27 (27-65) (65-) (4+) 3400 28900 > 28 (27-65) (63-64) (0-1) 1550 9900 > 29 (27-65) (63-64) (2-3) 1450 10000 > 30 (27-65) (63-64) (4+) 3200 27700 > 31 (27-65) (60-62) (0-1) 1250 9300 > 32 (27-65) (60-62) (2-3) 1250 9200 > 33 (27-65) (60-62) (4+) 2500 25600 > 34 (27-65) (-59) (0-1) 500 4700 > 35 (27-65) (-59) (2-3) 550 5300 > 36 (27-65) (-59) (4+) 1400 18100 > 37 (66-80) (65-) (0-1) 55 275 > 38 (66-80) (65-) (2-3) 40 250 > 39 (66-80) (65-) (4+) 180 1400 > 40 (66-80) (63-64) (0-1) 35 225 > 41 (66-80) (63-64) (2-3) 30 225 > 42 (66-80) (63-64) (4+) 155 1450 > 43 (66-80) (60-62) (0-1) 25 200 > 44 (66-80) (60-62) (2-3) 40 300 > 45 (66-80) (60-62) (4+) 130 1500 > 46 (66-80) (-59) (0-1) 25 175 > 47 (66-80) (-59) (2-3) 30 300 > 48 (66-80) (-59) (4+) 180 2400 > > claims.r =the number of claims made in a particular group. > exp.n=the total number of policyholders in a particular group. > > EXAMPLE > > As an example if we use age as an explanatory variable and fit the glm > model we get the following results: > > cars<-read.table("c:/a.dat",header=T) > attach(cars) > y<-cbind(claims.r,exp.n) > cars.age<-glm(y~age,family=binomial) > summary(cars.age) > > Call: > glm(formula = y ~ age, family = binomial) > > Deviance Residuals: > Min 1Q Median 3Q Max > -16.62952 -1.84073 -0.04282 2.07028 10.72502 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.46265 0.02050 -71.34 <2e-16 *** > age(23-26) -0.34576 0.03197 -10.82 <2e-16 *** > age(27-65) -0.66345 0.02182 -30.41 <2e-16 *** > age(66-80) -0.77863 0.04020 -19.37 <2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 1837.4 on 47 degrees of freedom > Residual deviance: 883.8 on 44 degrees of freedom > AIC: 1228.4 > > Number of Fisher Scoring iterations: 4 > > The Genstat output is displayed below. (well only some of it!!!) > > > ***** Regression Analysis ***** > > Response variate: claims_r > Binomial totals: exp_n > Distribution: Binomial > Link function: Logit > Fitted terms: Constant, age > > > *** Summary of analysis *** > > mean deviance approx > d.f. deviance deviance ratio chi pr > Regression 3 1298. 432.70 432.70 <.001 > Residual 44 1136. 25.83 > Total 47 2434. 51.80 > * MESSAGE: ratios are based on dispersion parameter with value 1 > > Dispersion parameter is fixed at 1.00 > > > *** Estimates of parameters *** > antilog of > estimate s.e. t(*) t pr. estimate > Constant -1.1992 0.0211 -56.90 <.001 0.3014 > age (23-26) -0.4302 0.0326 -13.20 <.001 0.6504 > age (27-65) -0.7999 0.0224 -35.75 <.001 0.4494 > age (66-80) -0.9297 0.0407 -22.86 <.001 0.3947 > * MESSAGE: s.e.s are based on dispersion parameter with value 1 > > Parameters for factors are differences compared with the reference > level: > Factor Reference level > age (17-22) > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595