nathaniel Grey
2007-May-06 12:02 UTC
[R] Neural Nets (nnet) - evaluating success rate of predictions
Hello R-Users, I have been using (nnet) by Ripley to train a neural net on a test dataset, I have obtained predictions for a validtion dataset using: PP<-predict(nnetobject,validationdata) Using PP I can find the -2 log likelihood for the validation datset. However what I really want to know is how well my nueral net is doing at classifying my binary output variable. I am new to R and I can't figure out how you can assess the success rates of predictions. Any help and examples would be much appreciated. Best wishes, Nathaniel Grey Research Associate Wolfson Research Associate University of Durham ___________________________________________________________ [[alternative HTML version deleted]]
hadley wickham
2007-May-07 09:22 UTC
[R] Neural Nets (nnet) - evaluating success rate of predictions
On 5/6/07, nathaniel Grey <nathaniel.grey at yahoo.co.uk> wrote:> Hello R-Users, > > I have been using (nnet) by Ripley to train a neural net on a test dataset, I have obtained predictions for a validtion dataset using: > > PP<-predict(nnetobject,validationdata) > > Using PP I can find the -2 log likelihood for the validation datset. > > However what I really want to know is how well my nueral net is doing at classifying my binary output variable. I am new to R and I can't figure out how you can assess the success rates of predictions. >table(PP, binaryvariable) should get you started. Also if you're using nnet with random starts, I strongly suggest taking the best out of several hundred (or maybe thousand) trials - it makes a big difference! Hadley
Antti Arppe
2007-May-08 11:36 UTC
[R] Neural Nets (nnet) - evaluating success rate of predictions
Nathaniel, On Mon, 7 May 2007, r-help-request at stat.math.ethz.ch wrote:> Date: Sun, 6 May 2007 12:02:31 +0000 (GMT) > From: nathaniel Grey <nathaniel.grey at yahoo.co.uk> > > However what I really want to know is how well my nueral net is > doing at classifying my binary output variable. I am new to R and I > can't figure out how you can assess the success rates of > predictions.I've been recently tacking this myself, though with respect to polytomous (>2) outcomes. The following approaches are based on Menard (1995), Cohen et al. (2002) and Manning & Sch?tze (1999). First you have to decide what is the critical probability that you use to classify the cases into class A (and consequently not(class[A])). The simplest level is 0.5, but other levels might also be motivated, see e.g. Cohen et al. (2002: 516-519). You can then treat the classification task as two distinct types, namely classification and prediction models, which have an effect on how the efficiency and accuracy of prediction is exactly measured (Menard 1995: 24-26). In a pure prediction model, we set no a priori expectation or constraint on the overall frequencies of the predicted classes. To the contrary, in a classification model our expectation is that the predicted outcome classes on the long run will end up having the same proportions as are evident in the training data. As the starting point for evaluating prediction efficiency is to compile a 2x2 prediction/classification table. Frequency counts on the (decending) diagonal in the table indicate correctly predicted and classified cases, whereas all counts off the diagonal are incorrect. For the two alternatives overerall, we can divide the predicted classifications into the four types presented below, on which the basic measures of prediction efficiency are based. (Manning and Sch?tze 1999: 267-271) Original/Predicted Class not(Class)(=Other) Class TP ~ True Positive) FN ~ False Negative not(Class) (=Other) FP ~ False Positive TN ~ True Negative You can then go on to calculate recall and precision, or spesificity or sensitivity. Recall is the proportion of original occurrences of some particular class for which the prediction is correct (formula 1 below, see Manning and Sch?tze 1999: 269, formula 8.4), whereas precision is the proportion of the all the predictions of some particular class, which turn out to be correct (formula 2 below, see Manning and Sch?tze 1999: 268, formula 8.3). Sensitivity is in fact exactly equal to recall, whereas specificity is understood as the proportion of non-cases correctly predicted or classified as non-cases, i.e. rejected (formula 3 below) Furthermore, there is a third pair of evaluation measures that one could also calculate, namely accuracy and error (formula 4 below) (Manning and Sch?tze 1999: 268-270). (1) Recall = TP / (TP + FN) (=Sensitivity) (2) Precision = TP / (TP + FP) (3) Specificity = TN / (TN + FN) (4) Accuracy = (TP + TN) / N = diag(n[k,k]) However, as has been noted in some earlier responses these aforementioned general measures do not in any way take into consideration whether prediction and classification according to a model, with the help of explanatory variables, performs any better than knowing the overall proportions of the outcome classes. For this purpose, the asymmetric summary measures of association based on Proportionate Reduction of Error (PRE) are good candidates for evaluating prediction accuracy, where we expect that the prediction or classification process on the basis of the models should exceed some baselines or thresholds. However, one cannot use the Goodman-Kruskal lambda and tau as such, but make some adjustments to account for the possibility of incorrect prediction. With this approach one compares prediction/classification errors with the model, error(model), to the baseline level of prediction/classification errors without the error(model, baseline), according to formula 10 below. (Menard 1995: 28-30). The formula for the error with the model remains the same, irrespective of whether we are evaluating prediction or classification accuracy, presented in (5), but the errors without the model vary according to the intended objective, presented in (6) and (7). Subsequently, the measure for the proportionate reduction of prediction error is presented in (9) below, and being analogous to the Goodman-Kruskal lambda it is designated as lambda(prediction). Similarly, the measure for proportionate reduction of classification error is presented in (10), and being analogous with the Goodman-Kruskal tau it is likewise designated as tau(classification). For both measures, positive values indicate better than baseline classification, while negative values worse performance. (5) error(model) = N - SUM{k=1...K}n[k,k] = N - SUM{diag(n)], where n is the 2x2 prediction/classification matrix (6) error(baseline, prediction) = N - max(R[k]), with R[k] = marginal row sums for each row k of altogether K classes and N the sum total of cases. (7) error(baseline, classification) = SUM{k=1...K}(R[k]?((N-R[k])/N) with R[k] = marginal row sums for each row k of altogether K classes and N the sum total of cases. (8) PRE = error(baseline)-error(model))/error(baseline,pred.|class.) (9) lambda(prediction) = 1-error(model) / error(baseline,prediction) (10) tau(classification) = 1-error(model)/ error(baseline,classification) REFERENCES: Cohen, Jacob, Cohen Patricia, West, Stephen G. and Leona S. Aiken. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd edition). Lawrence Erlbaum Associates, Mahwah, New Jersey. Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage University Paper Series on Quantitative Applications in the Social Sciences 07-106. Sage Publications, Thousand Oaks, California. Manning, Christopher D., and Hinrich Sch?tze. 1999. Foundations of statistical natural language processing." Cambridge, Massachusetts: MIT Press.
Antti Arppe
2007-May-10 09:51 UTC
[R] Neural Nets (nnet) - evaluating success rate of predictions
All, As an addition to my earlier posting, I've now implemented the PRE measures of prediction accuracy suggested by Menard (1995) as an R function, which is not a lengthy one and is thus attached below. With respect to the P-values one has an option in testing for either 1) significantly better prediction results or 2) significantly different (better or worse) results, so one can/should adjust the interpretation of the standardized d-value in the code accordingly. In the former case one should use the one-tailed value, and in the later case the two-tailed value. -Antti Arppe # Formulas for assessing prediction efficiency # # (C) Antti Arppe 2007 # # Observations by rows, predictions by columns # # All formulas according to to the following reference: # # Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage # University Paper Series on Quantitative Applications in the Social # Sciences 07-106. Sage Publications, Thousand Oaks, California. model.prediction.efficiency <- function(dat) { N <- sum(dat); # observed as row margins, predicted as column margins # according to Menard (1995: 24-32) sum.row <- apply(dat,1,sum); sum.col <- apply(dat,2,sum); correct.with.model <- sum(diag(dat)); errors.with.model <- N - correct.with.model; errors.without.model.prediction <- N - max(sum.row); errors.without.model.classification <- sum(sum.row*((N-sum.row)/N)); lambda.p <- 1-(errors.with.model/errors.without.model.prediction); d.lambda.p <- (errors.without.model.prediction/N-errors.with.model/N)/sqrt((errors.without.model.prediction/N)*(1-errors.without.model.prediction/N)/N); p.lambda.p <- 1-pnorm(d.lambda.p); tau.p <- 1-(errors.with.model/errors.without.model.classification); d.tau.p <- (errors.without.model.classification/N-errors.with.model/N)/sqrt((errors.without.model.classification/N)*(1-errors.without.model.classification/N)/N); p.tau.p <- 1-pnorm(d.tau.p); return(lambda.p, tau.p, d.lambda.p, d.tau.p, p.lambda.p, p.tau.p); } ----- Original Message ---- From: Antti Arppe <aarppe at ling.helsinki.fi> To: r-help at stat.math.ethz.ch Cc: Antti Arppe <aarppe at ling.helsinki.fi> Sent: Tuesday, 8 May, 2007 12:36:20 PM Subject: Re: [R] Neural Nets (nnet) - evaluating success rate of predictions On Mon, 7 May 2007, r-help-request at stat.math.ethz.ch wrote:> Date: Sun, 6 May 2007 12:02:31 +0000 (GMT) > From: nathaniel Grey <nathaniel.grey at yahoo.co.uk> > > However what I really want to know is how well my nueral net is > doing at classifying my binary output variable. I am new to R and I > can't figure out how you can assess the success rates of > predictions.I've been recently tacking this myself, though with respect to polytomous (>2) outcomes. The following approaches are based on Menard (1995), Cohen et al. (2002) and Manning & Sch?tze (1999). First you have to decide what is the critical probability that you use to classify the cases into class A (and consequently not(class[A])). The simplest level is 0.5, but other levels might also be motivated, see e.g. Cohen et al. (2002: 516-519). You can then treat the classification task as two distinct types, namely classification and prediction models, which have an effect on how the efficiency and accuracy of prediction is exactly measured (Menard 1995: 24-26). In a pure prediction model, we set no a priori expectation or constraint on the overall frequencies of the predicted classes. To the contrary, in a classification model our expectation is that the predicted outcome classes on the long run will end up having the same proportions as are evident in the training data. As the starting point for evaluating prediction efficiency is to compile a 2x2 prediction/classification table. Frequency counts on the (decending) diagonal in the table indicate correctly predicted and classified cases, whereas all counts off the diagonal are incorrect. For the two alternatives overerall, we can divide the predicted classifications into the four types presented below, on which the basic measures of prediction efficiency are based. (Manning and Sch?tze 1999: 267-271) Original/Predicted Class not(Class)(=Other) Class TP ~ True Positive) FN ~ False Negative not(Class) (=Other) FP ~ False Positive TN ~ True Negative You can then go on to calculate recall and precision, or spesificity or sensitivity. Recall is the proportion of original occurrences of some particular class for which the prediction is correct (formula 1 below, see Manning and Sch?tze 1999: 269, formula 8.4), whereas precision is the proportion of the all the predictions of some particular class, which turn out to be correct (formula 2 below, see Manning and Sch?tze 1999: 268, formula 8.3). Sensitivity is in fact exactly equal to recall, whereas specificity is understood as the proportion of non-cases correctly predicted or classified as non-cases, i.e. rejected (formula 3 below) Furthermore, there is a third pair of evaluation measures that one could also calculate, namely accuracy and error (formula 4 below) (Manning and Sch?tze 1999: 268-270). (1) Recall = TP / (TP + FN) (=Sensitivity) (2) Precision = TP / (TP + FP) (3) Specificity = TN / (TN + FN) (4) Accuracy = (TP + TN) / N = diag(n[k,k]) However, as has been noted in some earlier responses these aforementioned general measures do not in any way take into consideration whether prediction and classification according to a model, with the help of explanatory variables, performs any better than knowing the overall proportions of the outcome classes. For this purpose, the asymmetric summary measures of association based on Proportionate Reduction of Error (PRE) are good candidates for evaluating prediction accuracy, where we expect that the prediction or classification process on the basis of the models should exceed some baselines or thresholds. However, one cannot use the Goodman-Kruskal lambda and tau as such, but make some adjustments to account for the possibility of incorrect prediction. With this approach one compares prediction/classification errors with the model, error(model), to the baseline level of prediction/classification errors without the error(model, baseline), according to formula 10 below. (Menard 1995: 28-30). The formula for the error with the model remains the same, irrespective of whether we are evaluating prediction or classification accuracy, presented in (5), but the errors without the model vary according to the intended objective, presented in (6) and (7). Subsequently, the measure for the proportionate reduction of prediction error is presented in (9) below, and being analogous to the Goodman-Kruskal lambda it is designated as lambda(prediction). Similarly, the measure for proportionate reduction of classification error is presented in (10), and being analogous with the Goodman-Kruskal tau it is likewise designated as tau(classification). For both measures, positive values indicate better than baseline classification, while negative values worse performance. (5) error(model) = N - SUM{k=1...K}n[k,k] = N - SUM{diag(n)], where n is the 2x2 prediction/classification matrix (6) error(baseline, prediction) = N - max(R[k]), with R[k] = marginal row sums for each row k of altogether K classes and N the sum total of cases. (7) error(baseline, classification) = SUM{k=1...K}(R[k]?((N-R[k])/N) with R[k] = marginal row sums for each row k of altogether K classes and N the sum total of cases. (8) PRE = error(baseline)-error(model))/error(baseline,pred.|class.) (9) lambda(prediction) = 1-error(model) / error(baseline,prediction) (10) tau(classification) = 1-error(model)/ error(baseline,classification) REFERENCES: Cohen, Jacob, Cohen Patricia, West, Stephen G. and Leona S. Aiken. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd edition). Lawrence Erlbaum Associates, Mahwah, New Jersey. Menard, Scott. 1995. Applied Logistic Regression Analysis. Sage University Paper Series on Quantitative Applications in the Social Sciences 07-106. Sage Publications, Thousand Oaks, California. Manning, Christopher D., and Hinrich Sch?tze. 1999. Foundations of statistical natural language processing." Cambridge, Massachusetts: MIT Press. -----