Dear all, I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial? Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model. Thank you in advance for your help. Marina _________________________________________________________________ s. It's easy! aspx&mkt=en-us [[alternative HTML version deleted]]
Hi, I am not sure it is the best to use a binomial distribution for a continuous bounded variable. A beta distribution would be more appropriate, although I don't know how to define one for the gam() function. On the other hand beta distribution is closely linked to the gamma distribution so maybe you can use it to define a beta family for the gam() function. Some info about beta distribution: http://www.stat.purdue.edu/~jrnolan/portfolio/the_big_ten/beta.pdf Also, I am not very sure how you did a gam using binomial family without having your response data converted in 0 and 1. Didn't you get a warning saying that: Warning messages: 1: In eval(expr, envir, enclos) ... : non-integer #successes in a binomial glm! Maybe you can contact the author of the mgcv package. I am curious to see his response. Sorry I cannot help much more, Monica ------------------------------------------------------------------------------------------------------------------------ Message: 96 Date: Thu, 21 Aug 2008 00:53:52 +0200 From: Marina Laborde Subject: [R] GAM-binomial logit link To: Message-ID: Content-Type: text/plain Dear all, I'm using a binomial distribution with a logit link function to fit a GAM model. I have 2 questions about it. First i am not sure if i've chosen the most adequate distribution. I don't have presence/absence data (0/1) but I do have a rate which values vary between 0 and 1. This means the response variable is continuous even if within a limited interval. Should i use binomial? Secondly, in the numerical output i get negative values of UBRE score. I would like to know if one should consider the lowest absolute value or the lowest real value to select the best model. Thank you in advance for your help. Marina _________________________________________________________________ yahoo_082008
> Dear all, > > I'm using a binomial distribution with a logit link function to fit a GAM > model. I have 2 questions about it. > First i am not sure if i've chosen the most adequate distribution. I don't > have presence/absence data (0/1) but I do have a rate which values vary > between 0 and 1. This means the response variable is continuous even if > within a limited interval. Should i use binomial?I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance. You can find references in Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p. 619-632).> Secondly, in the numerical output i get negative values of UBRE score. I > would like to know if one should consider the lowest absolute value or the > lowest real value to select the best model.Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula to discover possible problems.> Thank you in advance for your help. > MarinaFabrizio Cipollini
> Dear all, > > I'm using a binomial distribution with a logit link function to fit a GAM > model. I have 2 questions about it. > First i am not sure if i've chosen the most adequate distribution. I don't > have presence/absence data (0/1) but I do have a rate which values vary > between 0 and 1. This means the response variable is continuous even if > within a limited interval. Should i use binomial?I guess safer to use the option family = quasibinomial since, with a continuous [0,1]-response, the empirical (conditional) variance of y can significantly differ from the corresponding theoretical binomial variance. You can find larger references in Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11, p. 619-632).> Secondly, in the numerical output i get negative values of UBRE score. I > would like to know if one should consider the lowest absolute value or the > lowest real value to select the best model.Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be nonnegative. Please inspect the values of the single elements inside the formula for discovering possible problems.> Thank you in advance for your help. > MarinaFabrizio Cipollini
No, i didnt get that warning ("In eval(expr, envir, enclos) ... :
non-integer #successes in a binomial glm!") because i used the continuous
bounded variable weighted by the effort.
This is, my formula was something like:
gam(SER_CD ~ s(DEP)+s(SST)+s(CLA)+s(SSH)+s(WST), weights=EFFORT, data=CD01,
family=binomial(link="logit"))
Do you still think it is preferable to use a beta / gamma / quasi-poisson
distribution?
And about the negative value of the UBRE score i still can´t understand...i dont
have missing nor negative values in the variables.
Any other suggestion? Which kind of issue might be causing this problem of
getting a negative UBRE?
Thnaks once again!
-------------------------------------------------------------------------------------------------
Message: 92
Date: Fri, 22 Aug 2008 09:48:02 +0200 (CEST)
From: "Fabrizio Cipollini" <cipollini@ds.unifi.it>
Subject: Re: [R] GAM-binomial logit link
To: r-help@r-project.org
Message-ID: <1661.87.5.105.223.1219391282.squirrel@ds.unifi.it>
Content-Type: text/plain;charset=iso-8859-1
I guess safer to use the option family = quasibinomial since, with a continuous
[0,1]-response, the empirical (conditional) variance of y can significantly
differ from the corresponding theoretical binomial variance.
You can find larger references in
Papke - Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11,
p.
619-632).
Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE scores should be
nonnegative. Please inspect the values of the single elements inside the formula
for discovering possible problems.
Fabrizio Cipollini
----------------------------------------------------------------------------------------------------
Message: 54
Date: Thu, 21 Aug 2008 20:09:08 +0000
From: Monica Pisica <pisicandru@hotmail.com>
Subject: Re: [R] GAM-binomial logit link
To: <r-help@r-project.org>
Message-ID: <BAY104-W27C9A6D9A85AE284C31366C36B0@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
Hi,
I am not sure it is the best to use a binomial distribution for a continuous
bounded variable. A beta distribution would be more appropriate, although I
don't know how to define one for the gam() function. On the other hand beta
distribution is closely linked to the gamma distribution so maybe you can use it
to define a beta family for the gam() function.
Some info about beta distribution:
http://www.stat.purdue.edu/~jrnolan/portfolio/the_big_ten/beta.pdf
Also, I am not very sure how you did a gam using binomial family without having
your response data converted in 0 and 1. Didn't you get a warning saying
that: Warning messages: 1: In eval(expr, envir, enclos) ... : non-intege[[elided
Yahoo spam]]
Maybe you can contact the author of the mgcv package. I am curious to see his
response.
Sorry I cannot help much more,
Monica
----------------------------------------------------------------------------------------------
Dear all,
I'm using a binomial distribution with a logit link function to fit a GAM
model. I have 2 questions about it. First i am not sure if i've chosen the
most adequate distribution. I don't have presence/absence data (0/1) but I
do have a rate which values vary between 0 and 1. This means the response
variable is continuous even if within a limited interval. Should i use binomial?
Secondly, in the numerical output i get negative values of UBRE score. I would
like to know if one should consider the lowest absolute value or the lowest real
value to select the best model.
Thank you in advance for your help.
Mar
[[alternative HTML version deleted]]
Hi Marina,
Unfortunately i don't know the answer. I still think the best is to contact
the author of the package. He might be still in his holiday since in UK school
starts sometimes in September, but certainly he is the best to answer your
question. I think it will help if you send him also your gam equation ..... Just
for fun, try to model your data with something different than binomial and see
if your UBRE value becomes positive. What happens if you don't weight your
data by the effort? What about modeling with a GLM??? Do you still get a
negative UBRE? Do you have lots of datapoints with same value? Did you make a
literature search to see how others modeled continuous bounded data? I am asking
that because maybe you will see what type of distribution they used and why and
maybe you can find your answer.
I still think you should model your data with a different distribution .... but
i don't have clear arguments why, only that binomial refers to probabilities
of a certain event to occur and have nothing to do with continuous variables.
The answer of your model does not have any meaning for your data.
Sorry i cannot be of more help,
Monica
Date: Mon, 25 Aug 2008 02:08:20 -0700From: jumpa79@yahoo.comSubject: Re: [R]
GAM-binomial logit linkTo: r-help@r-project.orgCC: pisicandru@hotmail.com;
cipollini@ds.unifi.it
No, i didnt get that warning ("In eval(expr, envir, enclos) ... :
non-integer #successes in a binomial glm!") because i used the continuous
bounded variable weighted by the effort.
This is, my formula was something like:
gam(SER_CD ~ s(DEP)+s(SST)+s(CLA)+s(SSH)+s(WST), weights=EFFORT, data=CD01,
family=binomial(link="logit"))
Do you still think it is preferable to use a beta / gamma / quasi-poisson
distribution?
And about the negative value of the UBRE score i still can´t understand...i dont
have missing nor negative values in the variables.
Any other suggestion? Which kind of issue might be causing this problem of
getting a negative UBRE?
Thnaks once again!
-------------------------------------------------------------------------------------------------
Message: 92Date: Fri, 22 Aug 2008 09:48:02 +0200 (CEST)From: "Fabrizio
Cipollini" <cipollini@ds.unifi.it>Subject: Re: [R] GAM-binomial logit
linkTo: r-help@r-project.orgMessage-ID:
<1661.87.5.105.223.1219391282.squirrel@ds.unifi.it>Content-Type:
text/plain;charset=iso-8859-1I guess safer to use the option family =
quasibinomial since, with a continuous [0,1]-response, the empirical
(conditional) variance of y can significantly differ from the corresponding
theoretical binomial variance.You can find larger references inPapke -
Wooldridge (1996), 'Journal of Applied Econometrics' (vol. 11,
p.619-632).Hmmm... On the basis of the UBRE formula within gam{mgcv}, UBRE
scores should be nonnegative. Please inspect the values of the single elements
inside the formula for discovering possible problems.Fabrizio Cipollini
----------------------------------------------------------------------------------------------------
Message: 54Date: Thu, 21 Aug 2008 20:09:08 +0000From: Monica Pisica
<pisicandru@hotmail.com>Subject: Re: [R] GAM-binomial logit linkTo:
<r-help@r-project.org>Message-ID:
<BAY104-W27C9A6D9A85AE284C31366C36B0@phx.gbl>Content-Type: text/plain;
charset="iso-8859-1"Hi,I am not sure it is the best to use a binomial
distribution for a continuous bounded variable. A beta distribution would be
more appropriate, although I don't know how to define one for the gam()
function. On the other hand beta distribution is closely linked to the gamma
distribution so maybe you can use it to define a beta family for the gam()
function.Some info about beta distribution:
http://www.stat.purdue.edu/~jrnolan/portfolio/the_big_ten/beta.pdfAlso, I am not
very sure how you did a gam using binomial family without having your response
data converted in 0 and 1. Didn't you get a warning saying that: Warning
messages: 1: In eval(expr, envir, enclos) ... : non-integer #successes in a
binomial glm!Maybe you can contact the author of the mgcv package. I am curious
to see his response.Sorry I cannot help much more,Monica
----------------------------------------------------------------------------------------------
Dear all,I'm using a binomial distribution with a logit link function to fit
a GAM model. I have 2 questions about it. First i am not sure if i've chosen
the most adequate distribution. I don't have presence/absence data (0/1) but
I do have a rate which values vary between 0 and 1. This means the response
variable is continuous even if within a limited interval. Should i use
binomial?Secondly, in the numerical output i get negative values of UBRE score.
I would like to know if one should consider the lowest absolute value or the
lowest real value to select the best model.Thank you in advance for your
help.Mar
_________________________________________________________________
yahoo_082008
[[alternative HTML version deleted]]