Saeed Abu Nimeh
2009-Mar-27 01:32 UTC
[R] ROCR package finding maximum accuracy and optimal cutoff point
If we use the ROCR package to find the accuracy of a classifier pred <- prediction(svm.pred, testset[,2]) perf.acc <- performance(pred,"acc") Do we?find the maximum accuracy?as follows?(is there a simplier way?):> max(perf.acc at x.values[[1]])Then to find the cutoff point that maximizes the accuracy?do we do the following?(is there a simpler way):> cutoff.list <- unlist(perf.acc at x.values[[1]]) > cutoff.list[which.max(perf.acc at y.values[[1]])]If the above is correct how is it possible to find the average false positive and negative rates? from the following perf.fpr <- performance(pred, "fpr") perf.fnr <- performance(pred, "fnr") The dataset that consists of two columns; score and a binary response, similar to this: 2.5, 0 -1, 0 2, 1 6.3, 1 4.1, 0 3.3, 1 Thanks, Saeed ?--- R 2.8.1 Win XP Pro SP2 ROCR package v1.0-2 e1071 v1.5-19
Saeed Abu Nimeh
2009-Mar-28 08:38 UTC
[R] ROCR package finding maximum accuracy and optimal cutoff point
Found the solution to my own question. To find the false positive rate and the false negative rate that correspond to a certain cutoff point using the ROCR package, one can do the following (for sure there is simpler ways, but this works): library(ElemStatLearn) library(rpart) data(spam) ################################## # create a train and test sets # ################################## index<- 1:nrow(spam) testindex <- sample(index, trunc(length(index)/3)) testset <- spam[testindex, ] trainset <- spam[-testindex, ] rpart.model <- rpart(spam ~ ., data = trainset) # training model ################################## # use ROCR to calculate accuracy # # fp,fn,tp,tn rates # ################################## library(ROCR) rpart.pred2 <- predict(rpart.model, testset)[,2] #testing model pred<-prediction(rpart.pred2,testset[,58]) #prediction using rocr perf.acc<-performance(pred,"acc") #find list of accuracies perf.fpr<-performance(pred,"fpr") # find list of fp rates perf.fnr<-performance(pred,"fnr") # find list of fn rates acc.rocr<-max(perf.acc at y.values[[1]]) # accuracy using rocr #find cutoff list for accuracies cutoff.list.acc <- unlist(perf.acc at x.values[[1]]) #find optimal cutoff point for accuracy optimal.cutoff.acc<-cutoff.list.acc[which.max(perf.acc at y.values[[1]])] #find optimal cutoff fpr, as numeric because a list is returned optimal.cutoff.fpr<-which(perf.fpr at x.values[[1]]==as.numeric(optimal.cutoff.acc)) # find cutoff list for fpr cutoff.list.fpr <- unlist(perf.fpr at y.values[[1]]) # find fpr using rocr fpr.rocr<-cutoff.list.fpr[as.numeric(optimal.cutoff.fpr)] #find optimal cutoff fnr optimal.cutoff.fnr<-which(perf.fnr at x.values[[1]]==as.numeric(optimal.cutoff.acc)) #find list of fnr cutoff.list.fnr <- unlist(perf.fnr at y.values[[1]]) #find fnr using rocr fnr.rocr<-cutoff.list.fnr[as.numeric(optimal.cutoff.fnr)] Now acc.rocr, fpr.rocr, fnr.rocr will give you the accuracy, fpr, and fnr percentages Saeed Abu Nimeh wrote:> If we use the ROCR package to find the accuracy of a classifier > pred <- prediction(svm.pred, testset[,2]) > perf.acc <- performance(pred,"acc") > > Do we find the maximum accuracy as follows (is there a simplier way?): >> max(perf.acc at x.values[[1]]) > > Then to find the cutoff point that maximizes the accuracy do we do the > following (is there a simpler way): >> cutoff.list <- unlist(perf.acc at x.values[[1]]) >> cutoff.list[which.max(perf.acc at y.values[[1]])] > > If the above is correct how is it possible to find the average false > positive and negative rates from the following > perf.fpr <- performance(pred, "fpr") > perf.fnr <- performance(pred, "fnr") > > The dataset that consists of two columns; score and a binary response, > similar to this: > 2.5, 0 > -1, 0 > 2, 1 > 6.3, 1 > 4.1, 0 > 3.3, 1 > > > Thanks, > Saeed > --- > R 2.8.1 Win XP Pro SP2 > ROCR package v1.0-2 > e1071 v1.5-19 >
Reasonably Related Threads
- pipe data from plot(). was: ROCR.plot methods, cross validation averaging
- SVM accuracy question
- How to re-combine values based on an index?
- Named numeric vectors with the same value but different names return different results when used as thresholds for calculating true positives
- Question about subsetting S4 object in ROCR