Enrico Giorgi
2010-Jan-07 13:10 UTC
[R] building a neural network on a unbalanced data set on R
Hello everybody, I work in a production plant as an operations analyst. I have been using R for two years, starting with my final dissertation project at college. We have the following problem in our plant. At the end of the production process, each joint (that is what we produce) must pass a final electrical test. The result can be 0 or 1. We think that this may depend on some raw materials parameters, so at first we have built a logistic regression model in order to make some forecasts. Now I would like to try with a neural network as well. Of course I would like to set the output response as logistic. But the values fitted on the training set turn out to be all 1s. I think that this depends on the following matter. The training data set is made up of 14 0s and 54 1s: so it is quite unbalanced. The net by default classifies as 1 all observations whose probability of success is greater than 0.5. I think that it would be enough to raise the cut-off probability to 0.79, as it is the fraction of 1s over the entire data set (54/68). So, a joint should be classified as 1 only if its probability is larger than 0.79. The problem is that I cannot find out how to set this threshold using the R command "nnet". Do you have any ideas? Thank you very much. Kind regards. Enrico Giorgi _________________________________________________________________ [[alternative HTML version deleted]]