Andrew Hill
2003-Sep-11 21:16 UTC
[R] discrepancy between R and Splus lm.influence() functions for family=Gamma(link=identity)
Hello, I am looking for an explanation and/or fix for a discrepancy in the behaviour of the R lm.influence() function [ version R 1.5.0 (2002-04-29) ] and the same function in Splus [ Splus version 5.1 release 1, running on SGI IRIX 6.2]. The discrepancy is of concern because I am migrating some Splus scripts to R and need to ensure consistency of results. Specifically, when I fit a glm() model to a test dataset using the family = Gamma(link=identity), and then call lm.influence on the fitted glm object, the resulting lm.influence()$coefficients and lm.influence()$sigma values are different between R and Splus versions. The lm.influence()$hat vector does agree between the two programs. Also, the glm() function does return the same model coefficient in both R and Splus. In contrast, if I use the default glm family=Gaussian(link=identity), all output of lm.influence() for both R and Splus does agree fully for my dataset. I have read the R help function for lm.influence() and I understand that R returns the difference between the model coefficients and the drop-one coefficients, while Splus returns the drop-one coefficients. But this does not account for the discrepancy that I see in the lm.influence$coefficients, nor the difference in lm.influence$sigma, at least to my understanding. Pasted below is output, first from R, and second from Splus, which illustates the issue. Discrepancies between the R and Splus $sigma values look like ~ 2-6%. Hopefully I have not overlooked an obvious statistical explanation for the difference. Thanks, Andrew. ----------------------------------------> version_ platform mips-sgi-irix6.5 arch mips os irix6.5 system mips, irix6.5 status major 1 minor 5.0 year 2002 month 04 day 29 language R> test3sj030110 sj030111 sj030112 control.ppm AFFX-DapX-M_at 216 236 227 13.3 AFFX-DapX-5_at 97 114 108 6.7 AFFX-CreX-5_at 1107 1021 1661 166.7 AFFX-BioB-5_at 595 511 772 83.3 AFFX-BioDn-3_at 3505 3059 4646 666.7 AFFX-BioB-3_at 2691 2310 3651 500.0 AFFX-CreX-3_at 189 198 262 20.0 AFFX-BioC-5_at 97 82 134 10.0 AFFX-BioC-3_at 286 236 381 33.3 AFFX-DapX-3_at 40 35 47 3.3 AFFX-BioB-M_at 2383 2082 3302 333.3> lm.influence( glm( sj030110~control.ppm-1, data=test3, family=Gamma(link=identity) ) )$hat [1] 0.0909091 0.0909091 0.0909091 0.0909091 0.0909091 0.0909091 0.0909091 [8] 0.0909091 0.0909091 0.0909091 0.0909091 $coefficients [,1] [1,] 0.57229388 [2,] 0.44528170 [3,] -0.29493540 [4,] -0.23350209 [5,] -0.48265734 [6,] -0.46441051 [7,] 0.01626405 [8,] 0.04076060 [9,] -0.07161078 [10,] 0.25888472 [11,] -0.23268389 $sigma [1] 0.3374653 0.3635937 0.3846727 0.3906073 0.3567701 0.3601862 0.4003299 [8] 0.4000826 0.3994681 0.3883333 0.3906765 -------------------------------------------------------- S-PLUS : Copyright (c) 1988, 1999 MathSoft, Inc. S : Copyright Lucent Technologies, Inc. Version 5.1 Release 1 for Silicon Graphics Irix, IRIX 6.2 : 1999> test3sj030110 sj030111 sj030112 control.ppm AFFX-DapX-M_at 216 236 227 13.3 AFFX-DapX-5_at 97 114 108 6.7 AFFX-CreX-5_at 1107 1021 1661 166.7 AFFX-BioB-5_at 595 511 772 83.3 AFFX-BioDn-3_at 3505 3059 4646 666.7 AFFX-BioB-3_at 2691 2310 3651 500.0 AFFX-CreX-3_at 189 198 262 20.0 AFFX-BioC-5_at 97 82 134 10.0 AFFX-BioC-3_at 286 236 381 33.3 AFFX-DapX-3_at 40 35 47 3.3 AFFX-BioB-M_at 2383 2082 3302 333.3> lm.influence( glm( sj030110~control.ppm-1, data=test3, family=Gamma(link=identity) ) )$coefficients: control.ppm AFFX-DapX-M_at 8.590989 AFFX-DapX-5_at 8.767288 AFFX-CreX-5_at 9.550982 AFFX-BioB-5_at 9.500764 AFFX-BioDn-3_at 9.689326 AFFX-BioB-3_at 9.676850 AFFX-CreX-3_at 9.270050 AFFX-BioC-5_at 9.245050 AFFX-BioC-3_at 9.356191 AFFX-DapX-3_at 9.002928 AFFX-BioB-M_at 9.500078 $sigma: AFFX-DapX-M_at AFFX-DapX-5_at AFFX-CreX-5_at AFFX-BioB-5_at AFFX-BioDn-3_at 0.317972 0.3625648 0.3996574 0.4038994 0.382934 AFFX-BioB-3_at AFFX-CreX-3_at AFFX-BioC-5_at AFFX-BioC-3_at AFFX-DapX-3_at 0.3847614 0.411836 0.4115877 0.4110434 0.3978158 AFFX-BioB-M_at 0.4039509 $hat: [1] 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 [7] 0.09090909 0.09090909 0.09090909 0.09090909 0.09090909 [[alternative HTML version deleted]]
Martin Maechler
2003-Sep-12 07:16 UTC
[R] discrepancy between R & Splus lm.influence() for family=Gamma
>>>>> "Andrew" == Andrew Hill <AHill at wyeth.com> >>>>> on Thu, 11 Sep 2003 17:16:30 -0400 writes:Andrew> Hello, I am looking for an explanation and/or fix Andrew> for a discrepancy in the behaviour of the R Andrew> lm.influence() function [ version R 1.5.0 Andrew> (2002-04-29) ] and the same function in Splus [ Andrew> Splus version 5.1 release 1, running on SGI IRIX Andrew> 6.2]. The discrepancy is of concern because I am Andrew> migrating some Splus scripts to R and need to ensure Andrew> consistency of results. Before reading on: Do you really mean R 1.5.0? If yes, you should definitely upgrade to R 1.7.1 ! There were considerable improvements (for R 1.7.0) for these functions, mostly thanks to John Fox, and the recommended way in R is to use influence() which is a generic function that has both an "lm" and (important for you!) a "glm" method. I.e., you use influence(mylmfit, ...) and the method influence.glm(mylmfit, ...) will be called. This should do the correct calculations for all kind of glm models. Regards, Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <>< Andrew> Specifically, when I fit a glm() model to a test Andrew> dataset using the family = Gamma(link=identity), and Andrew> then call lm.influence on the fitted glm object, the Andrew> resulting lm.influence()$coefficients and Andrew> lm.influence()$sigma values are different between R Andrew> and Splus versions. The lm.influence()$hat vector Andrew> does agree between the two programs. Also, the Andrew> glm() function does return the same model Andrew> coefficient in both R and Splus. Andrew> In contrast, if I use the default glm Andrew> family=Gaussian(link=identity), all output of Andrew> lm.influence() for both R and Splus does agree fully Andrew> for my dataset. Andrew> I have read the R help function for lm.influence() Andrew> and I understand that R returns the difference Andrew> between the model coefficients and the drop-one Andrew> coefficients, while Splus returns the drop-one Andrew> coefficients. But this does not account for the Andrew> discrepancy that I see in the Andrew> lm.influence$coefficients, nor the difference in Andrew> lm.influence$sigma, at least to my understanding. Andrew> Pasted below is output, first from R, and second Andrew> from Splus, which illustates the issue. Andrew> Discrepancies between the R and Splus $sigma values Andrew> look like ~ 2-6%. Hopefully I have not overlooked Andrew> an obvious statistical explanation for the Andrew> difference. Andrew> Thanks, Andrew.