Dear all, I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial? Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model. Thank you in advance for your help. Marina _________________________________________________________________ s. It's easy! aspx&mkt=en-us [[alternative HTML version deleted]]
Hi, I am not sure it is the best to use a binomial distribution for a continuous bounded variable. A beta distribution would be more appropriate, although I don't know how to define one for the gam() function. On the other hand beta distribution is closely linked to the gamma distribution so maybe you can use it to define a beta family for the gam() function. Some info about beta distribution: http://www.stat.purdue.edu/~jrnolan/portfolio/the_big_ten/beta.pdf Also, I am not very sure how you did a gam using binomial family without having your response data converted in 0 and 1. Didn't you get a warning saying that: Warning messages: 1: In eval(expr, envir, enclos) ... : non-integer #successes in a binomial glm! Maybe you can contact the author of the mgcv package. I am curious to see his response. Sorry I cannot help much more, Monica ------------------------------------------------------------------------------------------------------------------------ Message: 96 Date: Thu, 21 Aug 2008 00:53:52 +0200 From: Marina Laborde Subject: [R] GAM-binomial logit link To: Message-ID: Content-Type: text/plain Dear all, I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial? Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model. Thank you in advance for your help. Marina _________________________________________________________________ yahoo_082008
> Dear all, > > I'm using a binomial distribution with a logit link function to fit a GAM > model. I have 2 questions about it. > First i am not sure if i've chosen the most adequate distribution. I don't > have presence/absence data (0/1) but I do have a rate which values vary > between 0 and 1. This means the response variable is continuous even if > within a limited interval. Should i use binomial?I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance. You can find references in Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p. 619-632).> Secondly, in the numerical output i get negative values of UBRE score. I > would like to know if one should consider the lowest absolute value or the > lowest real value to select the best model.Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula to discover possible problems.> Thank you in advance for your help. > MarinaFabrizio Cipollini
> Dear all, > > I'm using a binomial distribution with a logit link function to fit a GAM > model. I have 2 questions about it. > First i am not sure if i've chosen the most adequate distribution. I don't > have presence/absence data (0/1) but I do have a rate which values vary > between 0 and 1. This means the response variable is continuous even if > within a limited interval. Should i use binomial?I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance. You can find larger references in Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p. 619-632).> Secondly, in the numerical output i get negative values of UBRE score. I > would like to know if one should consider the lowest absolute value or the > lowest real value to select the best model.Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula for discovering possible problems.> Thank you in advance for your help. > MarinaFabrizio Cipollini
No, i didnt get that warning ("In eval(expr, envir, enclos) ... : non-integer #successes in a binomial glm!") because i used the continuous bounded variable weighted by the effort. This is, my formula was something like: gam(SER_CD ~ s(DEP)+s(SST)+s(CLA)+s(SSH)+s(WST), weights=EFFORT, data=CD01, family=binomial(link="logit")) Do you still think it is preferable to use a beta / gamma / quasi-poisson distribution? And about the negative value of the UBRE score i still can´t understand...i dont have missing nor negative values in the variables. Any other suggestion? Which kind of issue might be causing this problem of getting a negative UBRE? Thnaks once again! ------------------------------------------------------------------------------------------------- Message: 92 Date: Fri, 22 Aug 2008 09:48:02 +0200 (CEST) From: "Fabrizio Cipollini" <cipollini@ds.unifi.it> Subject: Re: [R] GAM-binomial logit link To: r-help@r-project.org Message-ID: <1661.87.5.105.223.1219391282.squirrel@ds.unifi.it> Content-Type: text/plain;charset=iso-8859-1 I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance. You can find larger references in Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p. 619-632). Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula for discovering possible problems. Fabrizio Cipollini ---------------------------------------------------------------------------------------------------- Message: 54 Date: Thu, 21 Aug 2008 20:09:08 +0000 From: Monica Pisica <pisicandru@hotmail.com> Subject: Re: [R] GAM-binomial logit link To: <r-help@r-project.org> Message-ID: <BAY104-W27C9A6D9A85AE284C31366C36B0@phx.gbl> Content-Type: text/plain; charset="iso-8859-1" Hi, I am not sure it is the best to use a binomial distribution for a continuous bounded variable. A beta distribution would be more appropriate, although I don't know how to define one for the gam() function. On the other hand beta distribution is closely linked to the gamma distribution so maybe you can use it to define a beta family for the gam() function. Some info about beta distribution: http://www.stat.purdue.edu/~jrnolan/portfolio/the_big_ten/beta.pdf Also, I am not very sure how you did a gam using binomial family without having your response data converted in 0 and 1. Didn't you get a warning saying that: Warning messages: 1: In eval(expr, envir, enclos) ... : non-intege[[elided Yahoo spam]] Maybe you can contact the author of the mgcv package. I am curious to see his response. Sorry I cannot help much more, Monica ---------------------------------------------------------------------------------------------- Dear all, I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial? Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model. Thank you in advance for your help. Mar [[alternative HTML version deleted]]
Hi Marina, Unfortunately i don't know the answer. I still think the best is to contact the author of the package. He might be still in his holiday since in UK school starts sometimes in September, but certainly he is the best to answer your question. I think it will help if you send him also your gam equation ..... Just for fun, try to model your data with something different than binomial and see if your UBRE value becomes positive. What happens if you don't weight your data by the effort? What about modeling with a GLM??? Do you still get a negative UBRE? Do you have lots of datapoints with same value? Did you make a literature search to see how others modeled continuous bounded data? I am asking that because maybe you will see what type of distribution they used and why and maybe you can find your answer. I still think you should model your data with a different distribution .... but i don't have clear arguments why, only that binomial refers to probabilities of a certain event to occur and have nothing to do with continuous variables. The answer of your model does not have any meaning for your data. Sorry i cannot be of more help, Monica Date: Mon, 25 Aug 2008 02:08:20 -0700From: jumpa79@yahoo.comSubject: Re: [R] GAM-binomial logit linkTo: r-help@r-project.orgCC: pisicandru@hotmail.com; cipollini@ds.unifi.it No, i didnt get that warning ("In eval(expr, envir, enclos) ... : non-integer #successes in a binomial glm!") because i used the continuous bounded variable weighted by the effort. This is, my formula was something like: gam(SER_CD ~ s(DEP)+s(SST)+s(CLA)+s(SSH)+s(WST), weights=EFFORT, data=CD01, family=binomial(link="logit")) Do you still think it is preferable to use a beta / gamma / quasi-poisson distribution? And about the negative value of the UBRE score i still can´t understand...i dont have missing nor negative values in the variables. Any other suggestion? Which kind of issue might be causing this problem of getting a negative UBRE? Thnaks once again! ------------------------------------------------------------------------------------------------- Message: 92Date: Fri, 22 Aug 2008 09:48:02 +0200 (CEST)From: "Fabrizio Cipollini" <cipollini@ds.unifi.it>Subject: Re: [R] GAM-binomial logit linkTo: r-help@r-project.orgMessage-ID: <1661.87.5.105.223.1219391282.squirrel@ds.unifi.it>Content-Type: text/plain;charset=iso-8859-1I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance.You can find larger references inPapke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p.619-632).Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula for discovering possible problems.Fabrizio Cipollini ---------------------------------------------------------------------------------------------------- Message: 54Date: Thu, 21 Aug 2008 20:09:08 +0000From: Monica Pisica <pisicandru@hotmail.com>Subject: Re: [R] GAM-binomial logit linkTo: <r-help@r-project.org>Message-ID: <BAY104-W27C9A6D9A85AE284C31366C36B0@phx.gbl>Content-Type: text/plain; charset="iso-8859-1"Hi,I am not sure it is the best to use a binomial distribution for a continuous bounded variable. A beta distribution would be more appropriate, although I don't know how to define one for the gam() function. On the other hand beta distribution is closely linked to the gamma distribution so maybe you can use it to define a beta family for the gam() function.Some info about beta distribution: http://www.stat.purdue.edu/~jrnolan/portfolio/the_big_ten/beta.pdfAlso, I am not very sure how you did a gam using binomial family without having your response data converted in 0 and 1. Didn't you get a warning saying that: Warning messages: 1: In eval(expr, envir, enclos) ... : non-integer #successes in a binomial glm!Maybe you can contact the author of the mgcv package. I am curious to see his response.Sorry I cannot help much more,Monica ---------------------------------------------------------------------------------------------- Dear all,I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial?Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model.Thank you in advance for your help.Mar _________________________________________________________________ yahoo_082008 [[alternative HTML version deleted]]