Hi, I was working on a classification problem using the pamr package. I
used the pamr.adaptthresh() function to find the optimal accuracy of the
classifier. I must not be doing it right, since it doesn't return the
threshold values for optimum classification. For example,if I run it on
a dataset, I get the following result using pamr.adaptthresh():
predicted true (1) (2) (1) 32 8 (2) 5 17 i.e a
mis-classification of (5 + 8 ) / ( 32 + 8 + 5 + 17) However, if I just
use an arbitrary threshold (in this case, I chose '2'), I get the
following result: predicted true (1) (2) (1) 35 5 (2) 5 17
i.e a mis-classification of (5 + 5) / ( 32 + 8 + 5 + 17), which is clearly
better than the one that I got from using pamr.adaptthresh(). Am I doing
something wrong? What do I need to do to ensure that pamr.adaptthresh() returns
the least mis-classification error rate? I have tried using
different values for 'ntries', and 'reduction factor' in
pamr.adaptthresh(), without any success. I have reproduced my code
below. Any comments would be appreciated! thanks.
########################### CODE #################################
library(base) library(graphics) library(pamr) rm(list = ls()) gc()
makeColon <- function(){ # This dataset has 24 cancer, and 9 normal
samples n2 <- read.table("data/Colon.data",header = FALSE,sep
= ",") cancdat <- n2[,n2[1,]== 'tumor']
normdat <- n2[,n2[1,]== 'normal'] cancdat <- cancdat[-1,]
normdat <- normdat[-1,] mat <- as.matrix(cbind(cancdat,normdat))
actclass <- rep(c(1, 2), c(ncol(cancdat), ncol(normdat)))
return(list(mat,actclass)) }
m <- makeColon() mat <- m[[1]] actclass <- m[[2]] mat <-
matrix(as.numeric(mat),nrow(mat),ncol(mat)) geneid =
as.character(1:nrow(mat)) gs = as.character(1:nrow(mat)) mydata <- list(x=
mat,y=factor(actclass),geneid = geneid ,genenames=gs)
mytrain <- pamr.train(mydata) new.scales <-
pamr.adaptthresh(mytrain,ntries = 10, reduction.factor = 0.9)
mytrain2 <- pamr.train(mydata,threshold.scale = new.scales) mycv <-
pamr.cv(mytrain2,mydata,nfold = 10) res1 <- pamr.confusion(mycv,
threshold = mytrain2$threshold.scale,extra = FALSE) print(res1) res2 <-
pamr.confusion(mycv, threshold = 2,extra = FALSE) print(res2)
########################### END CODE ###############################
---------------------------------
[[alternative HTML version deleted]]