Christof Bigler
2004-Mar-24 22:49 UTC
[R] Adapting thresholds for predictions of ordinal logistic regression
I'm dealing with a classification problem using ordinal logistic regression. In the case of binary logistic regression with unequal proportions of 0's and 1's, a threshold in the interval [0,1] has to be adapted to transform back the predicted probabilities into 0 and 1. This can be done quite straightforward using e.g. the Kappa statistics as accuracy criterion. With ordinal logistic regression this seems to be more cumbersome, since several thresholds have to be adapted. Here, the Gamma statistics could be used as accuracy criterion. Below is an example showing individual response probabilities when you have equal and unequal proportions of four response categories. In the case of equal proportions (upper panel), one would reasonably assign the category with the highest probability. However, using the highest probability for unequal proportions (lower panel) would result in too many observations of class 2 being predicted as class 1. Is there any objective way to select the thresholds for assigning the categories in the case of unequal proportions? Thanks for your help! Christof ## R code library(Design) # Data set with equal proportions df1 <- cbind.data.frame(y=factor(c(rep(1,50),rep(2,50),rep(3,50),rep(4,50)))) df1$x <- c(rnorm(50,50,30),rnorm(50,100,30),rnorm(50,150,30),rnorm(50,200,30)) # Data set with unequal proportions df2 <- cbind.data.frame(y=factor(c(rep(1,200),rep(2,50),rep(3,30),rep(4,20)))) df2$x <- c(rnorm(200,50,30),rnorm(50,100,30),rnorm(30,150,30),rnorm(20,200,30)) # Fitting ordinal logistic regression models (proportional odds) f1 <- lrm(y ~ x, data=df1, x=TRUE, y=TRUE) f2 <- lrm(y ~ x, data=df2, x=TRUE, y=TRUE) # Individual response probabilities f.seq <- seq(-50,300) f1.pred <- predict.lrm(f1,newdata=f.seq,type="fitted.ind") f2.pred <- predict.lrm(f2,newdata=f.seq,type="fitted.ind") par(mfrow=c(2,1)) # First plot (equal proportions) plot(f.seq, f1.pred[,1],ylim=c(0,1),type="l",xlab="x",ylab="Pr(Y=j)",xlim=c( -50,300)) lines(f.seq,f1.pred[,2],col="red") lines(f.seq,f1.pred[,3],col="blue") lines(f.seq,f1.pred[,4],col="green") abline(v=c(50,100,150,200),lty=3) par(new=T) plot(df1$x,df1$y,xlab="",ylab="",axes=F,bty="n",xlim=c(-50,300)) axis(4,at=pretty(range(as.numeric(df1$y)))) # Second plot (unequal proportions) plot(f.seq, f2.pred[,1],ylim=c(0,1),type="l",xlab="x",ylab="Pr(Y=j)",xlim=c( -50,300)) lines(f.seq,f2.pred[,2],col="red") lines(f.seq,f2.pred[,3],col="blue") lines(f.seq,f2.pred[,4],col="green") abline(v=c(50,100,150,200),lty=3) par(new=T) plot(df2$x,df2$y,xlab="",ylab="",axes=F,bty="n",xlim=c(-50,300)) axis(4,at=pretty(range(as.numeric(df2$y))))