ahimsa campos arceiz
2005-Dec-26 10:28 UTC
[R] evaluation methods for logistic regression with proportion data
Dear list-members,
I have made a logistic regression analysis of the spatial distribution of
an ecological phenomenon (wildlife-caused crop damage). I divided the
region into 5x5 km grids, and in each grid I have performed a number of
questionnaires to asses the presence of crop damage in particular houses.
As a result, my dependent variable is not a simple presence/ absence data
but a proportion of positive responses (n of positive responses/ n of
questionnaires) per grid.
I used glm to fit a suitable model (all the variable and model selection
process is ok).
The problem comes once that I get the final model. Since my observed data
has no real positive or negative (but proportion of positives) I cannot
calculate specifity or sensitivity, and therefore (I think that) cannot use
statistics like kappa or area under the curve ROC to evaluate the
performance of my model.
Can anybody suggest a suitable method to evaluate the performance of this
kind of "proportion data" model that can be implemented in R? Does
any
body know any alternative as elegant as the AUC ROC for this case?
Besides the internal evaluation, I am planning to use bootstrap resampling
in order to produce "pseudo-independent" data to evaluate the
performance
of the model.
An example of how do my data look like:
observed predicted
0.200 0.4079725
0.556 0.5987730
0.500 0.9140571
0.857 0.8878290
0.875 0.7845368
1.000 0.8575587
1.000 0.9406087
0.778 0.5861066
0.600 0.6204616
1.000 0.8585725
0.000 0.2949169
0.100 0.1291246
0.444 0.7627612
observed = proportion of positive responses to crop damage questionnaires
in a 25 km2 grid
predicted = values produced by the final glm(binomial) model on the same
dataset as used to develop the model
Thanks a lot in advance for any suggestion
Ahimsa
Ahimsa Campos Arceiz
The University Museum,
The University of Tokyo
Hongo 7-3-1, Bunkyo-ku,
Tokyo 113-0033
phone +81-(0)3-5841-2824
cell +81-(0)80-5402-7702
[[alternative HTML version deleted]]
Reasonably Related Threads
- silly, extracting the value of "C" from the results of somers2
- scaling y-axis to relative frequency in multiple histogram (multhist)
- reshaping data frame
- fitting a lognormal distribution using cumulative probabilities
- how to apply the function cut( ) to many columns in a data.frame?
