thr3ads.net - R help - [R] glm or transformation of the response? [Jan 2012]

If this information is useful, please help other people find it:
Share via:

emily

2012-Jan-07 16:54 UTC

[R] glm or transformation of the response?

Hi Dr. Snow, 

 

I am a graduate student working on analyzing data for my thesis and came
across your post on  an R forum:

 

The default link function for the glm poisson family is a log link, which
means that it is fitting the model:
 
log(mu) ~ b0 + b1 * x
 
But the data that you generate is based on a linear link.  Therefore your
glm analysis does not match with how the data was generated (and therefore
should not necessarily be the best fit).  Either analyze using glm and a
linear link, or generate the data based on a log link (e.g. rpois(40,
exp(seq(1,3, length.out=40))) ).
 
Hope this helps,
 
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org <https://stat.ethz.ch/mailman/listinfo/r-help> 
801.408.8111

 

I am not using R at the moment (working in SPSS, have to love the GUI) but
my question is quite related:

I am running a generalized linear model on data highly skewed to the right
with a bunch of zeroes, so I decided to use the Tweedie distribution. In the
model I ran both untransformed data (with link=log) as well as log(x+1)
transformed data (with link=identity). The latter model had a much smaller
(more negative) AICc value than the untransformed data with link=log. 

Is it valid to run the GLM with log(x+1) transformed data if link=identity?
Or am I violating some kind of assumption about the model?

I really appreciate any advice or thoughts! It seems as if my go-to
statistician has taken a loooong break and any help would be greatly valued!

 

-Emily Bellush

QTGR@IUP.EDU

Indiana University of Pennsylvania


	[[alternative HTML version deleted]]

Joshua Wiley

2012-Jan-07 22:44 UTC

head link

[R] glm or transformation of the response?

Hi Emily,

This is the R-help forum---it is for R questions, not basic
statistics.  You should check out http://stats.stackexchange.com/ for
those type of questions.  glm(log(y) ~ x, poisson(link = "identity"))
is not the same as glm(y ~ x, poisson(link = "log")), so I am not
surprised you are getting different results.  An identity link and
data transformations do not inherently violate assumptions.  Depending
why you have a 'bunch of zeroes' I might consider a zero inflated
model or censored regression.

For more in depth discussion, I would suggesting heading over to stack
exchange and providing more details about your data and model.

Cheers,

Josh

On Sat, Jan 7, 2012 at 8:54 AM, emily <ebell545 at gmail.com>
wrote:> Hi Dr. Snow,
>
>
>
> I am a graduate student working on analyzing data for my thesis and came
> across your post on ?an R forum:
>
>
>
> The default link function for the glm poisson family is a log link, which
> means that it is fitting the model:
>
> log(mu) ~ b0 + b1 * x
>
> But the data that you generate is based on a linear link. ?Therefore your
> glm analysis does not match with how the data was generated (and therefore
> should not necessarily be the best fit). ?Either analyze using glm and a
> linear link, or generate the data based on a log link (e.g. rpois(40,
> exp(seq(1,3, length.out=40))) ).
>
> Hope this helps,
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org <https://stat.ethz.ch/mailman/listinfo/r-help>
> 801.408.8111
>
>
>
> I am not using R at the moment (working in SPSS, have to love the GUI) but
> my question is quite related:
>
> I am running a generalized linear model on data highly skewed to the right
> with a bunch of zeroes, so I decided to use the Tweedie distribution. In
the
> model I ran both untransformed data (with link=log) as well as log(x+1)
> transformed data (with link=identity). The latter model had a much smaller
> (more negative) AICc value than the untransformed data with link=log.
>
> Is it valid to run the GLM with log(x+1) transformed data if link=identity?
> Or am I violating some kind of assumption about the model?
>
> I really appreciate any advice or thoughts! It seems as if my go-to
> statistician has taken a loooong break and any help would be greatly
valued!
>
>
>
> -Emily Bellush
>
> QTGR at IUP.EDU
>
> Indiana University of Pennsylvania
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

Rolf Turner

2012-Jan-07 23:08 UTC

head link

[R] glm or transformation of the response?

On 08/01/12 05:54, emily wrote:> Hi Dr. Snow,
This is the r-help mailing list, not Greg Snow's private email.  If
you just want to email Dr.  Snow, then email *him* (his address was
given in the post to which you replied).

<SNIP>> I am not using R at the moment (working in SPSS, have to love the GUI)
     I can only feel pity for you.
> but my question is quite related:
>
> I am running a generalized linear model on data highly skewed to the right
> with a bunch of zeroes, so I decided to use the Tweedie distribution. In
the
> model I ran both untransformed data (with link=log) as well as log(x+1)
> transformed data (with link=identity). The latter model had a much smaller
> (more negative) AICc value than the untransformed data with link=log.
>
> Is it valid to run the GLM with log(x+1) transformed data if link=identity?
> Or am I violating some kind of assumption about the model?
You are simply fitting two very different models.

     (1) Tweedie distribution, log link:

         E(Y) = exp(beta_0 + beta_1 * x),  Y has a Tweedie distribution

     (2) Log transformation, identity link:

         V = log(Y + 1)

         E(V) = beta_0 + beta_1 * x,   V has a ??? (Tweedie???) 
distribution.

         E(Y) = E(exp(V))

         You know E(V) but you don't know E(exp(V)) --- and cannot 
readily calculate it
         from E(V).  So this second model may not be of much use to you 
--- depending
         of course on what use you are actually trying to make of it.

If Y has a Tweedie distribution (I've only heard of these; don't know 
anything about
them; I believe they can be complicated) then it seems to me unlikely 
that log(Y+1)
will also have one.  You need to decide if you know something about the 
distribution
of Y or if you know something about the distribution of log(Y+1).

To quote from the signature file of someone who posts to this list, 
``What problem
are you trying to solve?''

<SNIP>

     cheers,

         Rolf Turner

Maybe Matching Threads

Search for more possibly parallel threads

R help - Jan 2012 - glm or transformation of the response?

[R] glm or transformation of the response?

[R] glm or transformation of the response?

[R] glm or transformation of the response?

Maybe Matching Threads