Tim Marcella
2014-Mar-15 17:14 UTC
[R] Coding for segmented regression within a hurdle model
Hi, I am using a two part hurdle model to account for zero inflation and overdispersion in my count data. I would like to account for a segmented or breakpoint relationship in the binomial logistic hurdle model and pass these results onto the count model (negative binomial). Using the segemented package I have determined that my data supports one breakpoint at 3.85. The slope to this point is significant and will affect the presence of zeros in a linear fashion. The slope > 3.85 is non-significant and estimated to not help predict the presence of zeros in the data (threshold effect). Here are the results from this model Estimated Break-Point(s): Est. St.Err 3.853 1.372 t value for the gap-variable(s) V: 0 Meaningful coefficients of the linear terms: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.2750 0.3556 -0.774 0.4392 approach_km -0.4520 0.2184 -2.069 0.0385 * sea2 0.3627 0.2280 1.591 0.1117 U1.approach_km 0.4543 0.2188 2.076 NA U1.approach_km is the estimate for the second slope. The actual estimated slope for the section section is the difference between this value and the approach_km value (0.0023). I think that I have found a way to "maually" code this into the hurdle model as follows hurdle.fit <- hurdle(tot_f ~ x1 + x2 + x3 | approach_km + I(pmax(approach_km-3.849,0)) + sea ) When I look at the estimated coefficients from the "manual" code it gives the same values. However, the std.errors are estimated lower. Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -0.27441 0.29347 -0.935 0.350 approach_km -0.45261 0.09993 -4.529 5.92e-06 *** I(pmax(approach_km - 3.849, 0)) 0.45486 0.10723 4.242 2.22e-05 *** sea2 0.36271 0.22803 1.591 0.112 Question # 1: Does the hurdle equation use the standard errors from the zero model when building the count predictions? If no then I guess I would not have to worry about this and can just report the original std.errors and associated p values from the segemented object in the pub. Question # 2: If the count model uses the std.errors, how can I reformulate this equation to generate the original std.errors. Thanks, Tim [[alternative HTML version deleted]]
Achim Zeileis
2014-Mar-15 18:35 UTC
[R] Coding for segmented regression within a hurdle model
On Sat, 15 Mar 2014, Tim Marcella wrote:> Hi, > > I am using a two part hurdle model to account for zero inflation and > overdispersion in my count data. I would like to account for a segmented or > breakpoint relationship in the binomial logistic hurdle model and pass > these results onto the count model (negative binomial). > > Using the segemented package I have determined that my data supports one > breakpoint at 3.85. The slope to this point is significant and will affect > the presence of zeros in a linear fashion. The slope > 3.85 is > non-significant and estimated to not help predict the presence of zeros in > the data (threshold effect). Here are the results from this model > > Estimated Break-Point(s): > Est. St.Err > 3.853 1.372 > > t value for the gap-variable(s) V: 0 > > Meaningful coefficients of the linear terms: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.2750 0.3556 -0.774 0.4392 > approach_km -0.4520 0.2184 -2.069 0.0385 * > sea2 0.3627 0.2280 1.591 0.1117 > U1.approach_km 0.4543 0.2188 2.076 NA > > U1.approach_km is the estimate for the second slope. The actual estimated > slope for the section section is the difference between this value and the > approach_km value (0.0023). > > I think that I have found a way to "maually" code this into the hurdle > model as follows > > hurdle.fit <- hurdle(tot_f ~ x1 + x2 + x3 | approach_km + > I(pmax(approach_km-3.849,0)) + sea ) > > When I look at the estimated coefficients from the "manual" code it gives > the same values. However, the std.errors are estimated lower. > > Zero hurdle model coefficients (binomial with logit link): > Estimate Std. Error z value Pr(>|z|) > (Intercept) -0.27441 0.29347 -0.935 0.350 > approach_km -0.45261 0.09993 -4.529 5.92e-06 *** > I(pmax(approach_km - 3.849, 0)) 0.45486 0.10723 4.242 2.22e-05 *** > sea2 0.36271 0.22803 1.591 0.112 > > Question # 1: Does the hurdle equation use the standard errors from the > zero model when building the count predictions?No, both parts of the model can be estimated completely independently. Best, Z> If no then I guess I would > not have to worry about this and can just report the original std.errors > and associated p values from the segemented object in the pub. > Question # 2: If the count model uses the std.errors, how can I reformulate > this equation to generate the original std.errors. > > Thanks, Tim > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >