Andra Isan
2011-Sep-03 00:32 UTC
[R] ROCR package question for evaluating two regression models
Hello All, I have used logistic regression glm in R and I am evaluating two models both learned with glm but with different predictors. model1 <- glm (Y ~ x4+ x5+ x6+ x7, data = dat, family = binomial(link=logit))model2 <- glm (Y~ x1 + x2 +x3 , data = dat, family = binomial(link=logit)) and I would like to compare these two models based on the prediction that I get from each model: pred1 = predict(model1, test.data, type = "response")pred2 = predict(model2, test.data, type = "response") I have used ROCR package to compare them:pr1 = prediction(pred1,test.y)pf1 = performance(pr1, measure = "prec", x.measure = "rec") plot(pf1) which cutoff this plot is based on? pr2 = prediction(pred2,test.y)pf2 = performance(pr2, measure = "prec", x.measure = "rec")pf2_roc = performance(pr2,measure="err")plot(pf2) First of all, I would like to use cutoff = 0.5 and plot the ROC, precision-recall curves based on that cutoff value. In other words, how to define a cut off value in performance function?For example, in pf2_roc = performance(pr2,measure="err"), when I do plot(pf2_roc), it plots for every single cutoff point. I only want to have one cut off point, is there any way to do that?Second, I would like to see the performance of the two models based on the above measures on the same plot so the comparison would be easier. In other words, how can I plot (pf1, pf2) and compare them together?plot(pf1, pf2) would give me an error as follows:Error in as.double(x) : cannot coerce type 'S4' to vector of type 'double' Could you please help me with that? Thanks a lot,Andra [[alternative HTML version deleted]]
Frank Harrell
2011-Sep-03 13:24 UTC
[R] ROCR package question for evaluating two regression models
It is not possible to have one cutoff point unless you have a very strange utility function. Nor is there a need for a cutoff when using a probability model. It is not advisable to compare models based on ROC area as this loses power. A likelihood-based approach is recommended. Frank Andra Isan wrote:> > Hello All,? > I have used logistic regression glm in R and I am evaluating two models > both learned with glm but with different predictors.?model1 <- glm (Y ~ > x4+ x5+ x6+ x7, data = dat, family = binomial(link=logit))model2 <- glm > (Y~ x1 + x2 +x3 , data = dat,?family = binomial(link=logit))? > and I would like to compare these two models based on the prediction that > I get from each model: > pred1 = predict(model1, test.data, type = "response")pred2 > predict(model2, test.data, type = "response") > I have used ROCR package to compare them:pr1 = prediction(pred1,test.y)pf1 > = performance(pr1, measure = "prec", x.measure = "rec") ?plot(pf1) which > cutoff this plot is based on? > pr2 = prediction(pred2,test.y)pf2 = performance(pr2, measure = "prec", > x.measure = "rec")pf2_roc ?= performance(pr2,measure="err")plot(pf2) > First of all, I would like to use cutoff = 0.5 and plot the ROC, > precision-recall curves based on that cutoff value. In other words, how to > define a cut off value in performance function?For example, in?pf2_roc ?> performance(pr2,measure="err"), when I do plot(pf2_roc), it plots for > every single cutoff point. I only want to have one cut off point, is there > any way to do that?Second, I would like to see the performance of the two > models based on the above measures on the same plot so the comparison > would be easier. In other words, how can I plot (pf1, pf2) and compare > them?together?plot(pf1, pf2) would give me an error as follows:Error in > as.double(x) :?? cannot coerce type 'S4' to vector of type 'double' > Could you please help me with that? > Thanks a lot,Andra > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/ROCR-package-question-for-evaluating-two-regression-models-tp3787301p3787855.html Sent from the R help mailing list archive at Nabble.com.
RockO
2011-Sep-04 20:38 UTC
[R] ROCR package question for evaluating two regression models
Hi Andra, I have been doing some ROC analysis for a new diagnosis test. I used the pROC package to assess thresholds and compare different diagnosis tests to a "gold standard". In your case, let say the gold standard are the observed values y0. Here is an example: y0 <- sample(0:1,50,replace=TRUE) # Make observed binomial values test1<-sample(0:100,50,replace=TRUE)/100 y1 <- ifelse(y0==0,test,1-test) # Make first predicted model values test2<-sample(0:100,50,replace=TRUE)/100 y2 <- ifelse(y0==0,test,1-test) # make 2nd predicted model values library(pROC) i1<-roc(response=y0,predictor=y1,percent=TRUE, plot=TRUE, of="threshold",ci=T, lwd=1,lty=2,thresholds="best", asp=1) i2<-roc(response=y0,predictor=y2,percent=TRUE, plot=TRUE, of="threshold",ci=T, lwd=1,lty=3,thresholds="best", add=T) coords(i1,x="best",best.method="youden") # Best threshold of y1 with the Youden index coords(i2,x="best",best.method="youden") # Best threshold of y1 with the Youden index roc.test(i1,i2) # Compare the performance of the best threshold of y1 and y2 See ?pROC for more details. Hope this help, Rock -- View this message in context: http://r.789695.n4.nabble.com/ROCR-package-question-for-evaluating-two-regression-models-tp3787301p3789946.html Sent from the R help mailing list archive at Nabble.com.