Hello all, I have a data set where the response variable is the percent cover of a specific plant (represented in cover classes 0,1,2,3,4,5, or 6). This data set has a lot of zeros (plots where the plant was not present). I am trying to model cover class of the plant as a function of both total nitrogen and shrub cover. After quite a bit of research I have come across a conditional approach to modeling data with a lot of zeros (Fletcher et al. 2005, Welsh et al. 1996). In this approach you model the presence/absence data using a logistic regression and then model the presence only data using ordinary (least squares) regression. I have successfully come up with both a logistic model and an ols model with good fits. I am running into trouble combining the two (as outlined in the third step of the Fletcher et al. 2005 paper). Does anyone have any experience or any advice on doing this? How does one come up with an overall model for the data using this approach? Thanks for your help! Kirsten -- View this message in context: http://r.789695.n4.nabble.com/Conditional-model-in-R-tp4651188.html Sent from the R help mailing list archive at Nabble.com.
Sounds like a finite mixture model. I haven't read your references but an overall model for such an approach could be f(Y=0; pi, kappa) = 1- pi + pi*f(Y=0|Z=1; kappa) where pi=Pr(Z=1) is the probability of an event, z, and y is the value observed when the event occurs and f is the probability density of Y with parameters kappa. You could try 'fmr' in Jim Lindsey's gnlm package (available at http://www.commanster.eu/rcode.html ) which fits generalized nonlinear regression models with two or three point mixtures using maximum likelihood.> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Kirsten Martin > Sent: Wednesday, November 28, 2012 1:33 PM > To: r-help at r-project.org > Subject: [R] Conditional model in R > > Hello all, > > I have a data set where the response variable is the percent cover of a > specific plant (represented in cover classes 0,1,2,3,4,5, or 6). This > data > set has a lot of zeros (plots where the plant was not present). > I am trying to model cover class of the plant as a function of both > total > nitrogen and shrub cover. > > After quite a bit of research I have come across a conditional approach > to > modeling data with a lot of zeros (Fletcher et al. 2005, Welsh et al. > 1996). > In this approach you model the presence/absence data using a logistic > regression and then model the presence only data using ordinary (least > squares) regression. > > I have successfully come up with both a logistic model and an ols model > with > good fits. I am running into trouble combining the two (as outlined in > the > third step of the Fletcher et al. 2005 paper). > > Does anyone have any experience or any advice on doing this? How does > one > come up with an overall model for the data using this approach? > > Thanks for your help! > Kirsten > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Conditional-model-in-R-tp4651188.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Kirsten, The overall model is the combination of both models. If you call the parameter estimates from the logistic regression betas and the parameter estimates from the linear regression alpha, you could write the predictive equation something like this (ignoring error terms): cover = (alpha0 + alpha1*nitr + alpha2*shrub) / {1 + exp[-(beta0 + beta1*nitr + beta2*shrub)]} That's not really an R question, though, so perhaps what you really want to know is how to calculate predicted values? If so, you could do something like this. I am assuming that your data is in a data frame called "df", with variables "cover", "nitr", and "shrub". # fit a logistic regression to the presence absence data present <- cover>0 fitL <- glm(present ~ nitr + shrub, family="binomial", data=df) # fit a regression to the abundance data, when present fitD <- lm(log(cover) ~ nitr + shrub, data=df[present , ]) # calculate predicted values from the "combined" model pcomb <- fitL$fitted * exp(predict(fitD, newdata=df)) Jean Kirsten Martin <kmmartin@knights.ucf.edu> wrote on 11/28/2012 01:32:43 PM:> > Hello all, > > I have a data set where the response variable is the percent cover of a > specific plant (represented in cover classes 0,1,2,3,4,5, or 6). Thisdata> set has a lot of zeros (plots where the plant was not present). > I am trying to model cover class of the plant as a function of bothtotal> nitrogen and shrub cover. > > After quite a bit of research I have come across a conditional approachto> modeling data with a lot of zeros (Fletcher et al. 2005, Welsh et al.1996).> In this approach you model the presence/absence data using a logistic > regression and then model the presence only data using ordinary (least > squares) regression. > > I have successfully come up with both a logistic model and an ols modelwith> good fits. I am running into trouble combining the two (as outlined inthe> third step of the Fletcher et al. 2005 paper). > > Does anyone have any experience or any advice on doing this? How doesone> come up with an overall model for the data using this approach? > > Thanks for your help! > Kirsten[[alternative HTML version deleted]]
On Nov 28, 2012, at 11:32 AM, Kirsten Martin wrote:> Hello all, > > I have a data set where the response variable is the percent cover of a > specific plant (represented in cover classes 0,1,2,3,4,5, or 6). This data > set has a lot of zeros (plots where the plant was not present). > I am trying to model cover class of the plant as a function of both total > nitrogen and shrub cover. > > After quite a bit of research I have come across a conditional approach to > modeling data with a lot of zeros (Fletcher et al. 2005, Welsh et al. 1996). > In this approach you model the presence/absence data using a logistic > regression and then model the presence only data using ordinary (least > squares) regression.Just because you have zeroes does not mean a Poisson model for instance might no be a good fit. You are dealing with count data and you really ought to at least attempt to model it using an appropriate distribution. Achim Zeileis has written avery nice tutorial on using R for count data. A google-search with his name and 'count data' will likely get it as the first hit.> > I have successfully come up with both a logistic model and an ols model with > good fits. I am running into trouble combining the two (as outlined in the > third step of the Fletcher et al. 2005 paper). > > Does anyone have any experience or any advice on doing this? How does one > come up with an overall model for the data using this approach? >You might search on "hurdle models".> require(sos) > findFn("hurdle model")found 43 matches; retrieving 3 pages 2 3 Downloaded 23 links in 8 packages. -- David Winsemius, MD Alameda, CA, USA