Eva May
2009-Jun-25 14:18 UTC
[R] [e1071] Inconsistent results when using matrix.csr for svm() - possibly scaling problem
Dear all, I'm training an SVM with default settings on a matrix csr (SparseM package). I realized that if I train the SVM with the (hopefully) equivalent matrix (Matrix package) representation, the returned models and predictions sometimes differ. I expected both representations of the same data to lead to the same results though. It could be that it is a scaling problem, because unscaled results are equal(see below). I'm using the SparseM_0.80 and e1071_1.5-19. This is what I do (details can be found below):> #run model on csr matrixes > model <- svm(matrixcsrTraining,classfactorTraining) > predict(model, matrixcsrTest)1 0.944625> #run model on matrix representation (Matrix) of the csr matrix from above > model <- svm(as.matrix(matrixcsrTraining),classfactorTraining) > predict(model, as.matrix(matrixcsrTest))1 0.8325838 Possibly this is a scaling problem with sparse matrices, because results are equal, if scaling is disabled.> #run model on csr matrixes without scaling > model <- svm(matrixcsrTraining,classfactorTraining, scale = FALSE) > predict(model, matrixcsrTest)1 0.944625> #run model on normal matrixes without scaling > model <- svm(as.matrix(matrixcsrTraining),classfactorTraining, scale = FALSE) > predict(model, as.matrix(matrixcsrTest))1 0.944625 Is scaling different for both formats? Or is there no scaling for SparseM? Thank you very much for your help, Eva CS bachelor student --------------------------------------------------------------------------- --------------------------------------------------------------------------- Details: Code below, files attached #read in data coordinates <- read.csv('vector.data',head=TRUE) j <- subset(coordinates,select=c(j)) ja <- as.integer(j[1:dim(j)[1],]) i<-subset(coordinates,select=c(i)) ia <- as.integer(i[1:dim(i)[1],]) classes <- read.csv("classes.data",head=TRUE) classfactorTraining <- classes[1:(max(ia)),] #build matrixcoo first, then matrixcsr for training dim <- as.integer(c(max(ia),max(ja))) matrixcoo = new("matrix.coo",ra=rep(1,dim(j)[1]),ja=ja,ia=ia,dim=dim) matrixcsrTraining = as.matrix.csr(matrixcoo) #build a simple matrix for testing matrixcoo = new("matrix.coo",ra=rep(1,1),ja=as.integer(c(13)),ia=as.integer(c(1)),dim=as.integer(c(1,max(ja)))) matrixcsrTest = as.matrix.csr(matrixcoo) #run model on csr matrixes model <- svm(matrixcsrTraining,classfactorTraining, scale = FALSE) predict(model, matrixcsrTest) #run model on normal matrixes model <- svm(as.matrix(matrixcsrTraining),classfactorTraining, scale = FALSE) predict(model, as.matrix(matrixcsrTest)) ------------------------------------------------------------------------------ Masked Methods: The following object(s) are masked from package:stats : model.response The following object(s) are masked from package:base : backsolve, chol ----------------------------------------------------------------- Session info:> sessionInfo()R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] methods stats graphics grDevices utils datasets base other attached packages: [1] SparseM_0.80 e1071_1.5-19 class_7.2-47 -- Be Yourself @ mail.com! Choose From 200+ Email Addresses Get a Free Account at www.mail.com