Hi,
I was working on a classification problem using the pamr package. I used the
pamr.adaptthresh() function to find the optimal accuracy of the classifier. I
must not be doing it right, since it doesn't return the threshold values for
optimum classification.
For example,if I run it on a dataset, I get the following result using
pamr.adaptthresh():
predicted
true (1) (2)
(1) 32 8
(2) 5 17
i.e a mis-classification of (5 + 8 ) / ( 32 + 8 + 5 + 17)
However, if I just use an arbitrary threshold (in this case, I chose
'2'), I get the following result:
predicted
true (1) (2)
(1) 35 5
(2) 5 17
i.e a mis-classification of (5 + 5) / ( 32 + 8 + 5 + 17), which is clearly
better than the one that I got from using pamr.adaptthresh().
Am I doing something wrong? What do I need to do to ensure that
pamr.adaptthresh() returns the least mis-classification error rate?
I have tried using different values for 'ntries', and 'reduction
factor' in pamr.adaptthresh(), without any success.
I have reproduced my code below. Any comments would be appreciated!
thanks.
########################### CODE #################################
library(multtest) # golub
library(siggenes) # SAM
library(e1071) # support vector m/c
library(base)
library(graphics)
library(pamr)
library(bootstrap)
rm(list = ls())
gc()
makeColon <- function(){
# This dataset has 24 cancer, and 9 normal samples
n2 <- read.table("data/Colon.data",header = FALSE,sep =
",")
cancdat <- n2[,n2[1,]== 'tumor']
normdat <- n2[,n2[1,]== 'normal']
cancdat <- cancdat[-1,]
normdat <- normdat[-1,]
mat <- as.matrix(cbind(cancdat,normdat))
actclass <- rep(c(1, 2), c(ncol(cancdat), ncol(normdat)))
return(list(mat,actclass))
}
m <- makeColon()
mat <- m[[1]]
actclass <- m[[2]]
mat <- matrix(as.numeric(mat),nrow(mat),ncol(mat))
geneid = as.character(1:nrow(mat))
gs = as.character(1:nrow(mat))
mydata <- list(x= mat,y=factor(actclass),geneid = geneid ,genenames=gs)
mytrain <- pamr.train(mydata)
new.scales <- pamr.adaptthresh(mytrain,ntries = 10, reduction.factor =
0.9) mytrain2 <- pamr.train(mydata,threshold.scale =
new.scales)
mycv <- pamr.cv(mytrain2,mydata,nfold = 10)
res1 <- pamr.confusion(mycv, threshold = mytrain2$threshold.scale,extra =
FALSE)
print(res1)
res2 <- pamr.confusion(mycv, threshold = 2,extra = FALSE)
print(res2)
########################### END CODE ###############################
---------------------------------
[[alternative HTML version deleted]]