thr3ads.net - R help - [R] Calculating RMSE in R from hurdle regression object [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Tim Marcella

2014-Mar-12 17:55 UTC

[R] Calculating RMSE in R from hurdle regression object

Hi,

My data is characterized by many zeros (82%) and overdispersion. I have
chosen to model with hurdle regression (pscl package) with a negative
binomial distribution for the count data. In an effort to validate the
model I would like to calculate the RMSE of the predicted vs. the observed
values. From my reading I understand that this is the calculated on the raw
residuals generated from the model output. This is the formula I used

H1.RMSE <- sqrt(mean(H1$residuals^2))     # Where H1 is my fitted hurdle
model

I get 46.7 as the RMSE. This seems high to me based on the model results.
Assuming my formula and my understanding of RMSE is correct (and please
correct me if I am wrong) I question whether this is an appropriate use of
validation for this particular structure of model. The hurdle model
correctly predicts all of my zeros. The predictions I get from the fitted
model are all values greater than zero. From my readings I understand that
the predictions from the fitted hurdle model are means generated for the
particular covariate environment based on the model coefficients. If this
is truly the case it does not make sense to compare these means to the
observations. This will generate large residuals (only 18% of the
observations contain counts greater than 0, while the predicted counts all
exceed 0). It seems like comparing apples to oranges. Other correlative
tests (Pearson's r, Spearman's p) would seem to be comparing the mean
predicted value for particular covariate to the observed which again is
heavily dominated by zeros.

Any tips on how best to validate hurdle models in R?

Thanks

	[[alternative HTML version deleted]]

David March Morla

2014-Mar-12 18:51 UTC

head link

[R] Calculating RMSE in R from hurdle regression object

Dear Tim,

I think that in this paper you would find a suite of different metrics 
to evaluate your hurdle model:
Potts, Joanne M., and Jane Elith. "Comparing species abundance
models."
Ecological Modelling 199.2 (2006): 153-163.

Best regards,
David March

El 12/03/2014 18:55, Tim Marcella escribi?:> Hi,
>
> My data is characterized by many zeros (82%) and overdispersion. I have
> chosen to model with hurdle regression (pscl package) with a negative
> binomial distribution for the count data. In an effort to validate the
> model I would like to calculate the RMSE of the predicted vs. the observed
> values. From my reading I understand that this is the calculated on the raw
> residuals generated from the model output. This is the formula I used
>
> H1.RMSE <- sqrt(mean(H1$residuals^2))     # Where H1 is my fitted hurdle
> model
>
> I get 46.7 as the RMSE. This seems high to me based on the model results.
> Assuming my formula and my understanding of RMSE is correct (and please
> correct me if I am wrong) I question whether this is an appropriate use of
> validation for this particular structure of model. The hurdle model
> correctly predicts all of my zeros. The predictions I get from the fitted
> model are all values greater than zero. From my readings I understand that
> the predictions from the fitted hurdle model are means generated for the
> particular covariate environment based on the model coefficients. If this
> is truly the case it does not make sense to compare these means to the
> observations. This will generate large residuals (only 18% of the
> observations contain counts greater than 0, while the predicted counts all
> exceed 0). It seems like comparing apples to oranges. Other correlative
> tests (Pearson's r, Spearman's p) would seem to be comparing the
mean
> predicted value for particular covariate to the observed which again is
> heavily dominated by zeros.
>
> Any tips on how best to validate hurdle models in R?
>
> Thanks
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
David March Morl?
Spatial Ecologist
Email: david at imedea.uib-csic.es

IMEDEA
Instituto Mediterraneo de Estudios Avanzados (UIB-CSIC)
C/Miquel Marqu?s 21, 07190 Esporles, Balearic Islands. Spain
www.imedea.uib.es

SOCIB
Balearic Islands Coastal Observing and Forecasting System
Strategic Issues and Applications for Society (SIAS Division)
Parc Bit, Naorte, Bloc A 2?p. pta. 3, 07121 Palma de Mallorca. Spain
Tel: +034 971 43 97 64
www.socib.es

SOCIAL MEDIA
Google Scholar: http://scholar.google.es/citations?user=xABsDpAAAAAJ
Research Gate: https://www.researchgate.net/profile/David_March3/
Linked In: http://www.linkedin.com/in/dmarch

Achim Zeileis

2014-Mar-12 21:51 UTC

head link

[R] Calculating RMSE in R from hurdle regression object

On Wed, 12 Mar 2014, Tim Marcella wrote:
> Hi,
>
> My data is characterized by many zeros (82%) and overdispersion. I have
> chosen to model with hurdle regression (pscl package) with a negative
> binomial distribution for the count data. In an effort to validate the
> model I would like to calculate the RMSE of the predicted vs. the observed
> values. From my reading I understand that this is the calculated on the raw
> residuals generated from the model output.
In count regressions (and other GLM-type models) the raw residuals are not 
necessarily a good measure because the observations are always 
heteroscedastic. Low predicted counts also have low variances while higher 
counts have high variances.
> This is the formula I used
>
> H1.RMSE <- sqrt(mean(H1$residuals^2))     # Where H1 is my fitted hurdle
> model
>
> I get 46.7 as the RMSE. This seems high to me based on the model 
> results. Assuming my formula and my understanding of RMSE is correct 
> (and please correct me if I am wrong) I question whether this is an 
> appropriate use of validation for this particular structure of model. 
> The hurdle model correctly predicts all of my zeros. The predictions I 
> get from the fitted model are all values greater than zero. From my 
> readings I understand that the predictions from the fitted hurdle model 
> are means generated for the particular covariate environment based on 
> the model coefficients.
Yes.
> If this is truly the case it does not make sense to compare these means 
> to the observations. This will generate large residuals (only 18% of the 
> observations contain counts greater than 0, while the predicted counts 
> all exceed 0). It seems like comparing apples to oranges.
Well, it compares the predicted means to the observations. It's not apples 
and oranges but they're also not exactly the same thing. Looking at this 
thread where a similar question was asked might help:

https://stat.ethz.ch/pipermail/r-help/2011-June/279765.html
> Other correlative tests (Pearson's r, Spearman's p) would seem to
be
> comparing the mean predicted value for particular covariate to the 
> observed which again is heavily dominated by zeros.
>
> Any tips on how best to validate hurdle models in R?
>
> Thanks
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Mar 2014 - Calculating RMSE in R from hurdle regression object

[R] Calculating RMSE in R from hurdle regression object

[R] Calculating RMSE in R from hurdle regression object

[R] Calculating RMSE in R from hurdle regression object