Lam, Tzeng Yih
2008-Dec-11 05:42 UTC
[R] Validity of GLM using Gaussian family with sqrt link
Dear all, I have the following dataset: each row corresponds to count of forest floor small mammal captured in a plot and vegetation characteristics measured at that plot> sotrplot cnt herbc herbht 1 1A1 0 37.08 53.54 2 1A3 1 36.27 26.67 3 1A5 0 32.50 30.62 4 1A7 0 56.54 45.63 5 1B2 0 41.66 38.13 6 1B4 0 32.08 37.79 7 1B6 0 33.71 30.62 ... I am interested in comparing fit of different specification of Generalized Linear Models (although there are some issues with using AIC or BIC for comparison, but this is the question that I like to post here). Here are two of the several models that I am interested in: (1) Poission log-linear model> pois<-glm(cnt~herbc+herbht,family=poisson,data=sotr) > summary(pois)Call: glm(formula = cnt ~ herbc + herbht, family = poisson, data = sotr) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.341254 0.089969 -14.908 <2e-16 *** herbc -0.007303 0.003469 -2.105 0.0353 * herbht 0.024064 0.002659 9.051 <2e-16 *** --- Null deviance: 1699.0 on 1180 degrees of freedom Residual deviance: 1569.8 on 1178 degrees of freedom AIC: 2311.4 (2) Gaussian with sqrt link model> gaus.sqrt<-glm(cnt~herbc+herbht,family=gaussian(link="sqrt"),data=sotr,start=c(0.1,-0.004,0.01)) > summary(gaus.sqrt)Call: glm(formula = cnt ~ herbc + herbht, family = gaussian(link = "sqrt"), data = sotr, start = c(0.1, -0.004, 0.01)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.462211 0.043475 10.632 < 2e-16 *** herbc -0.003315 0.001661 -1.996 0.0461 * herbht 0.010241 0.001291 7.935 4.86e-15 *** --- Null deviance: 1144.6 on 1180 degrees of freedom Residual deviance: 1062.9 on 1178 degrees of freedom AIC: 3235.0> logLik(gaus.sqrt)'log Lik.' -1613.524 (df=4)>From the glm() help file that I read, family=gaussian() accepts the links "identity", "log" and "inverse". There is no mentioning of gaussian() accepting "sqrt" link. Although "sqrt" link is available for family=poisson()A. Therefore, is the code in (2) actually computing Maximum Likelihood Estimates (MLE) of the coefficients using Gaussian family with "sqrt" link or is it computing MLE of something else? B. If the code in (2) is computing the MLE with gaussian(link="sqrt"), then will the maximized value of log-likelihood function using logLik() be valid (other than the issue that the dispersion parameter is counted as a parameter in aic() within glm())? Thank you in advance and I appreciate it very much for any advices that are offered. Best regards, TzengYih Lam TzengYih Lam, PhD Student College of Forestry Oregon State University [[alternative HTML version deleted]]
Prof Brian Ripley
2008-Dec-11 07:45 UTC
[R] Validity of GLM using Gaussian family with sqrt link
a) There is a difference between link=sqrt and link="sqrt". link: a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector or an object of class '"link-glm"' (such as generated by 'make.link') provided it is not specified _via_ one of the standard names given next. link-sqrt is a name and not accepted. link="sqrt" is a literal character string, and is. b) Your first model is a model for integer observations, the second for continuous observations. As such, the log-likleihoods are computed with respect to different reference measures and are not comparable. In less technical terms, in model 1 you compute the likelihood from probabilities and in model 2 from probability densities, and the latter depend on the units of measurement. On Wed, 10 Dec 2008, Lam, Tzeng Yih wrote:> Dear all, > > I have the following dataset: each row corresponds to count of forest floor small mammal captured in a plot and vegetation characteristics measured at that plot > >> sotr > plot cnt herbc herbht > 1 1A1 0 37.08 53.54 > 2 1A3 1 36.27 26.67 > 3 1A5 0 32.50 30.62 > 4 1A7 0 56.54 45.63 > 5 1B2 0 41.66 38.13 > 6 1B4 0 32.08 37.79 > 7 1B6 0 33.71 30.62 > ... > > I am interested in comparing fit of different specification of > Generalized Linear Models (although there are some issues with using AIC > or BIC for comparison, but this is the question that I like to post > here). Here are two of the several models that I am interested in: > > (1) Poission log-linear model >> pois<-glm(cnt~herbc+herbht,family=poisson,data=sotr) >> summary(pois) > Call: > glm(formula = cnt ~ herbc + herbht, family = poisson, data = sotr) > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.341254 0.089969 -14.908 <2e-16 *** > herbc -0.007303 0.003469 -2.105 0.0353 * > herbht 0.024064 0.002659 9.051 <2e-16 *** > --- > Null deviance: 1699.0 on 1180 degrees of freedom > Residual deviance: 1569.8 on 1178 degrees of freedom > AIC: 2311.4 > > > (2) Gaussian with sqrt link model >> gaus.sqrt<-glm(cnt~herbc+herbht,family=gaussian(link="sqrt"),data=sotr,start=c(0.1,-0.004,0.01)) >> summary(gaus.sqrt) > Call: > glm(formula = cnt ~ herbc + herbht, family = gaussian(link = "sqrt"), > data = sotr, start = c(0.1, -0.004, 0.01)) > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.462211 0.043475 10.632 < 2e-16 *** > herbc -0.003315 0.001661 -1.996 0.0461 * > herbht 0.010241 0.001291 7.935 4.86e-15 *** > --- > Null deviance: 1144.6 on 1180 degrees of freedom > Residual deviance: 1062.9 on 1178 degrees of freedom > AIC: 3235.0 > >> logLik(gaus.sqrt) > 'log Lik.' -1613.524 (df=4) > >> From the glm() help file that I read, family=gaussian() accepts the links "identity", "log" and "inverse". There is no mentioning of gaussian() accepting "sqrt" link. Although "sqrt" link is available for family=poisson() > > A. Therefore, is the code in (2) actually computing Maximum Likelihood > Estimates (MLE) of the coefficients using Gaussian family with "sqrt" > link or is it computing MLE of something else? > > B. If the code in (2) is computing the MLE with gaussian(link="sqrt"), > then will the maximized value of log-likelihood function using logLik() > be valid (other than the issue that the dispersion parameter is counted > as a parameter in aic() within glm())? > > Thank you in advance and I appreciate it very much for any advices that are offered. > > Best regards, > TzengYih Lam > > > TzengYih Lam, PhD Student > College of Forestry > Oregon State University > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley
2008-Dec-11 17:04 UTC
[R] Validity of GLM using Gaussian family with sqrt link
Please do read the help page: fortune("WTFM") applies. On Thu, 11 Dec 2008, Gerard M. Keogh wrote:> Hi all, > > Just on this question : > > can I assume any R internal defined function can be used to describe the > link (e.g. = "arctan") so long as its increasing and monotone? > How might abs work for example - (except at 0)?No.> And/or finally, can I define any old function in R called "myfun" and use > link="myfun" provided myfun is a sort of "nice" function?No.>From the help page:link: a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector or an object of class '"link-glm"' (such as generated by 'make.link') provided it is not specified _via_ one of the standard names given next. You have to specify a *model link function*, as in> gaussian(link="arctan")Error in switch(link, logit = { : ?arctan? link not recognised Only those known by name (to make.link) or an object of the specified class can be used.> Gerard > > > > > "Lam, Tzeng Yih" > <Tzengyih.Lam at ore > gonstate.edu> To > Sent by: "Prof Brian Ripley" > r-help-bounces at r- <ripley at stats.ox.ac.uk> > project.org cc > r-help at r-project.org > Subject > 11/12/2008 15:20 Re: [R] Validity of GLM using > Gaussian family with sqrt link > > > > > > > > > > > Dear Prof. Ripley, > > Thank you for your quick response. > > (A) >> link-sqrt is a name and not accepted. link="sqrt" is a literal character > string, and is. > > I am not entirely sure whether I understand that statement but this is what > I found out. If I specify family=gaussian(link=sqrt), the glm() fails to > run because it is not a default link (so, I understand this part). > Following Venables and Ripley (2002): > >> > summary(glm(cnt~herbc+herbht,data=sotr,family=gaussian(link="sqrt"),start=c(0.1,-0.004,0.01))) > > Call: > glm(formula = cnt ~ herbc + herbht, family = gaussian(link = "sqrt"), > data = sotr, start = c(0.1, -0.004, 0.01)) > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.462211 0.043475 10.632 < 2e-16 *** > herbc -0.003315 0.001661 -1.996 0.0461 * > herbht 0.010241 0.001291 7.935 4.86e-15 *** > > AIC: 3235.0 > >> > summary(glm(cnt~herbc+herbht,data=sotr,family=quasi(link=power(0.5),variance=constant),start=c(0.1,-0.004,0.01))) > > Call: > glm(formula = cnt ~ herbc + herbht, family = quasi(link = power(0.5)), > data = sotr, start = c(0.1, -0.004, 0.01)) > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.462211 0.043475 10.632 < 2e-16 *** > herbc -0.003315 0.001661 -1.996 0.0461 * > herbht 0.010241 0.001291 7.935 4.86e-15 *** > > AIC: NA > > Notice that the parameter estimates and corresponding standard errors are > identical. So, my interpretation is that family=gaussian(link="sqrt") is > identical as specify family=quasi(link=power(0.5)) in glm(). The exception > is that AIC (and thus maximized log-likelihood values) can be computed for > family=gaussian(link="sqrt"). > > The questions are: > (A.1) Is this interpretation correct? > (A.2) If (A.1) is true, does family=gaussian(link="sqrt") implies that I am > doing a Generalized Linear Model with normal distribution and the link > function is: sqrt(mu) = b0+b1(herbc)+b2(herbht)? > > (B) >> In less technical terms, in model 1 you compute the likelihood from > probabilities >> and in model 2 from probability densities, and the latter depend on the >> units of measurement. > Yes, you are correct and I understand it now. Although not as common these > days, some small mammal studies still use sqrt transformation of count as > response variable and carry out a linear model fitting with predictors (via > least squares). So, the exercise that I got into is to compare performances > of linear model with sqrt transformation of count and GLM with Poisson. > However, knowing that we can't compare logLik or AIC based on different > measures of responses. So, I thought that comparison under GLM framework > might be an approach closer to the intention. > > Thank again for your quick respond and advices. I appreciate it very much. > > Best regards, > TzengYih Lam > > ----------------------------------- > Ph.D. student > College of Forestry > Oregon State University > > > -----Original Message----- > From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] > Sent: Wed 12/10/2008 11:45 PM > To: Lam, Tzeng Yih > Cc: r-help at r-project.org > Subject: Re: [R] Validity of GLM using Gaussian family with sqrt link > > a) There is a difference between link=sqrt and link="sqrt". > > link: a specification for the model link function. This can be a > name/expression, a literal character string, a length-one > character vector or an object of class '"link-glm"' (such as > generated by 'make.link') provided it is not specified _via_ > one of the standard names given next. > > link-sqrt is a name and not accepted. link="sqrt" is a literal character > string, and is. > > b) Your first model is a model for integer observations, the second for > continuous observations. As such, the log-likleihoods are computed with > respect to different reference measures and are not comparable. In less > technical terms, in model 1 you compute the likelihood from probabilities > and in model 2 from probability densities, and the latter depend on the > units of measurement. > > > On Wed, 10 Dec 2008, Lam, Tzeng Yih wrote: > >> Dear all, >> >> I have the following dataset: each row corresponds to count of forest > floor small mammal captured in a plot and vegetation characteristics > measured at that plot >> >>> sotr >> plot cnt herbc herbht >> 1 1A1 0 37.08 53.54 >> 2 1A3 1 36.27 26.67 >> 3 1A5 0 32.50 30.62 >> 4 1A7 0 56.54 45.63 >> 5 1B2 0 41.66 38.13 >> 6 1B4 0 32.08 37.79 >> 7 1B6 0 33.71 30.62 >> ... >> >> I am interested in comparing fit of different specification of >> Generalized Linear Models (although there are some issues with using AIC >> or BIC for comparison, but this is the question that I like to post >> here). Here are two of the several models that I am interested in: >> >> (1) Poission log-linear model >>> pois<-glm(cnt~herbc+herbht,family=poisson,data=sotr) >>> summary(pois) >> Call: >> glm(formula = cnt ~ herbc + herbht, family = poisson, data = sotr) >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -1.341254 0.089969 -14.908 <2e-16 *** >> herbc -0.007303 0.003469 -2.105 0.0353 * >> herbht 0.024064 0.002659 9.051 <2e-16 *** >> --- >> Null deviance: 1699.0 on 1180 degrees of freedom >> Residual deviance: 1569.8 on 1178 degrees of freedom >> AIC: 2311.4 >> >> >> (2) Gaussian with sqrt link model >>> > gaus.sqrt<-glm(cnt~herbc+herbht,family=gaussian(link="sqrt"),data=sotr,start=c(0.1,-0.004,0.01)) > >>> summary(gaus.sqrt) >> Call: >> glm(formula = cnt ~ herbc + herbht, family = gaussian(link = "sqrt"), >> data = sotr, start = c(0.1, -0.004, 0.01)) >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 0.462211 0.043475 10.632 < 2e-16 *** >> herbc -0.003315 0.001661 -1.996 0.0461 * >> herbht 0.010241 0.001291 7.935 4.86e-15 *** >> --- >> Null deviance: 1144.6 on 1180 degrees of freedom >> Residual deviance: 1062.9 on 1178 degrees of freedom >> AIC: 3235.0 >> >>> logLik(gaus.sqrt) >> 'log Lik.' -1613.524 (df=4) >> >>> From the glm() help file that I read, family=gaussian() accepts the > links "identity", "log" and "inverse". There is no mentioning of gaussian() > accepting "sqrt" link. Although "sqrt" link is available for > family=poisson() >> >> A. Therefore, is the code in (2) actually computing Maximum Likelihood >> Estimates (MLE) of the coefficients using Gaussian family with "sqrt" >> link or is it computing MLE of something else? >> >> B. If the code in (2) is computing the MLE with gaussian(link="sqrt"), >> then will the maximized value of log-likelihood function using logLik() >> be valid (other than the issue that the dispersion parameter is counted >> as a parameter in aic() within glm())? >> >> Thank you in advance and I appreciate it very much for any advices that > are offered. >> >> Best regards, >> TzengYih Lam >> >> >> TzengYih Lam, PhD Student >> College of Forestry >> Oregon State University >> >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > ********************************************************************************** > The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. It is the policy of the Department of Justice, Equality and Law Reform and the Agencies and Offices using its IT services to disallow the sending of offensive material. > Should you consider that the material contained in this message is offensive you should contact the sender immediately and also mailminder[at]justice.ie. > > Is le haghaidh an duine n? an eintitis ar a bhfuil s? d?rithe, agus le haghaidh an duine n? an eintitis sin amh?in, a bhearta?tear an fhaisn?is a tarchuireadh agus f?adfaidh s? go bhfuil ?bhar faoi r?n agus/n? faoi phribhl?id inti. Toirmisctear aon athbhreithni?, atarchur n? leathadh a dh?anamh ar an bhfaisn?is seo, aon ?s?id eile a bhaint aisti n? aon ghn?omh a dh?anamh ar a hiontaoibh, ag daoine n? ag eintitis seachas an faighteoir beartaithe. M? fuair t? ? seo tr? dhearmad, t?igh i dteagmh?il leis an seolt?ir, le do thoil, agus scrios an t-?bhar as aon r?omhaire. Is ? beartas na Roinne Dl? agus Cirt, Comhionannais agus Athch?irithe Dl?, agus na nOif?g? agus na nGn?omhaireachta? a ?s?ideann seirbh?s? TF na Roinne, seoladh ?bhair chol?il a dh?chead?. > M?s rud ? go measann t? gur ?bhar col?il at? san ?bhar at? sa teachtaireacht seo is ceart duit dul i dteagmh?il leis an seolt?ir l?ithreach agus le mailminder[ag]justice.ie chomh maith. > *********************************************************************************** > > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595