Hi Dr. Snow, I am a graduate student working on analyzing data for my thesis and came across your post on an R forum: The default link function for the glm poisson family is a log link, which means that it is fitting the model: log(mu) ~ b0 + b1 * x But the data that you generate is based on a linear link. Therefore your glm analysis does not match with how the data was generated (and therefore should not necessarily be the best fit). Either analyze using glm and a linear link, or generate the data based on a log link (e.g. rpois(40, exp(seq(1,3, length.out=40))) ). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org <https://stat.ethz.ch/mailman/listinfo/r-help> 801.408.8111 I am not using R at the moment (working in SPSS, have to love the GUI) but my question is quite related: I am running a generalized linear model on data highly skewed to the right with a bunch of zeroes, so I decided to use the Tweedie distribution. In the model I ran both untransformed data (with link=log) as well as log(x+1) transformed data (with link=identity). The latter model had a much smaller (more negative) AICc value than the untransformed data with link=log. Is it valid to run the GLM with log(x+1) transformed data if link=identity? Or am I violating some kind of assumption about the model? I really appreciate any advice or thoughts! It seems as if my go-to statistician has taken a loooong break and any help would be greatly valued! -Emily Bellush QTGR@IUP.EDU Indiana University of Pennsylvania [[alternative HTML version deleted]]
Hi Emily, This is the R-help forum---it is for R questions, not basic statistics. You should check out http://stats.stackexchange.com/ for those type of questions. glm(log(y) ~ x, poisson(link = "identity")) is not the same as glm(y ~ x, poisson(link = "log")), so I am not surprised you are getting different results. An identity link and data transformations do not inherently violate assumptions. Depending why you have a 'bunch of zeroes' I might consider a zero inflated model or censored regression. For more in depth discussion, I would suggesting heading over to stack exchange and providing more details about your data and model. Cheers, Josh On Sat, Jan 7, 2012 at 8:54 AM, emily <ebell545 at gmail.com> wrote:> Hi Dr. Snow, > > > > I am a graduate student working on analyzing data for my thesis and came > across your post on ?an R forum: > > > > The default link function for the glm poisson family is a log link, which > means that it is fitting the model: > > log(mu) ~ b0 + b1 * x > > But the data that you generate is based on a linear link. ?Therefore your > glm analysis does not match with how the data was generated (and therefore > should not necessarily be the best fit). ?Either analyze using glm and a > linear link, or generate the data based on a log link (e.g. rpois(40, > exp(seq(1,3, length.out=40))) ). > > Hope this helps, > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > greg.snow at imail.org <https://stat.ethz.ch/mailman/listinfo/r-help> > 801.408.8111 > > > > I am not using R at the moment (working in SPSS, have to love the GUI) but > my question is quite related: > > I am running a generalized linear model on data highly skewed to the right > with a bunch of zeroes, so I decided to use the Tweedie distribution. In the > model I ran both untransformed data (with link=log) as well as log(x+1) > transformed data (with link=identity). The latter model had a much smaller > (more negative) AICc value than the untransformed data with link=log. > > Is it valid to run the GLM with log(x+1) transformed data if link=identity? > Or am I violating some kind of assumption about the model? > > I really appreciate any advice or thoughts! It seems as if my go-to > statistician has taken a loooong break and any help would be greatly valued! > > > > -Emily Bellush > > QTGR at IUP.EDU > > Indiana University of Pennsylvania > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
On 08/01/12 05:54, emily wrote:> Hi Dr. Snow,This is the r-help mailing list, not Greg Snow's private email. If you just want to email Dr. Snow, then email *him* (his address was given in the post to which you replied). <SNIP>> I am not using R at the moment (working in SPSS, have to love the GUI)I can only feel pity for you.> but my question is quite related: > > I am running a generalized linear model on data highly skewed to the right > with a bunch of zeroes, so I decided to use the Tweedie distribution. In the > model I ran both untransformed data (with link=log) as well as log(x+1) > transformed data (with link=identity). The latter model had a much smaller > (more negative) AICc value than the untransformed data with link=log. > > Is it valid to run the GLM with log(x+1) transformed data if link=identity? > Or am I violating some kind of assumption about the model?You are simply fitting two very different models. (1) Tweedie distribution, log link: E(Y) = exp(beta_0 + beta_1 * x), Y has a Tweedie distribution (2) Log transformation, identity link: V = log(Y + 1) E(V) = beta_0 + beta_1 * x, V has a ??? (Tweedie???) distribution. E(Y) = E(exp(V)) You know E(V) but you don't know E(exp(V)) --- and cannot readily calculate it from E(V). So this second model may not be of much use to you --- depending of course on what use you are actually trying to make of it. If Y has a Tweedie distribution (I've only heard of these; don't know anything about them; I believe they can be complicated) then it seems to me unlikely that log(Y+1) will also have one. You need to decide if you know something about the distribution of Y or if you know something about the distribution of log(Y+1). To quote from the signature file of someone who posts to this list, ``What problem are you trying to solve?'' <SNIP> cheers, Rolf Turner