ahimsa campos arceiz
2005-Dec-26 10:28 UTC
[R] evaluation methods for logistic regression with proportion data
Dear list-members, I have made a logistic regression analysis of the spatial distribution of an ecological phenomenon (wildlife-caused crop damage). I divided the region into 5x5 km grids, and in each grid I have performed a number of questionnaires to asses the presence of crop damage in particular houses. As a result, my dependent variable is not a simple presence/ absence data but a proportion of positive responses (n of positive responses/ n of questionnaires) per grid. I used glm to fit a suitable model (all the variable and model selection process is ok). The problem comes once that I get the final model. Since my observed data has no real positive or negative (but proportion of positives) I cannot calculate specifity or sensitivity, and therefore (I think that) cannot use statistics like kappa or area under the curve ROC to evaluate the performance of my model. Can anybody suggest a suitable method to evaluate the performance of this kind of "proportion data" model that can be implemented in R? Does any body know any alternative as elegant as the AUC ROC for this case? Besides the internal evaluation, I am planning to use bootstrap resampling in order to produce "pseudo-independent" data to evaluate the performance of the model. An example of how do my data look like: observed predicted 0.200 0.4079725 0.556 0.5987730 0.500 0.9140571 0.857 0.8878290 0.875 0.7845368 1.000 0.8575587 1.000 0.9406087 0.778 0.5861066 0.600 0.6204616 1.000 0.8585725 0.000 0.2949169 0.100 0.1291246 0.444 0.7627612 observed = proportion of positive responses to crop damage questionnaires in a 25 km2 grid predicted = values produced by the final glm(binomial) model on the same dataset as used to develop the model Thanks a lot in advance for any suggestion Ahimsa Ahimsa Campos Arceiz The University Museum, The University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033 phone +81-(0)3-5841-2824 cell +81-(0)80-5402-7702 [[alternative HTML version deleted]]
Apparently Analagous Threads
- silly, extracting the value of "C" from the results of somers2
- scaling y-axis to relative frequency in multiple histogram (multhist)
- reshaping data frame
- fitting a lognormal distribution using cumulative probabilities
- how to apply the function cut( ) to many columns in a data.frame?