Dear all, I have a question about using categorical predictors for SVM, using "svm" from library(e1071). If I have multiple categorical predictors, should they just be included as factors? Take a simple artificial data example: x1<-rnorm(500) x2<-rnorm(500) #Categorical Predictor 1, with 5 levels x3<-as.factor(rep(c(1,2,3,4,5),c(50,150,130,70,100))) #Catgegorical Predictor 2, with 3 levels x4<-as.factor(rep(c("R","B","G"),c(100,200,200))) #Response y<-rep(c(-1,1),c(275,225)) class<-as.factor(y) svmdata<-cbind(class,x1,x2,x3,x4) mod1<-svm(class~.,data=svmdata,type="C-classification") OR should each factor be coded as an indicator variable? E.g. for categorical predictor 2, since there're 3 levels, we code: (R,R,B,G,G) = ( (1,0,0),(1,0,0),(0,1,0),(0,0,1),(0,0,1) ) There are no errors when I run the model using either method, but I'm unsure which is correct for svm in 'e1071'. Many thanks. V.V. [[alternative HTML version deleted]]
Dear all, I have a question about using categorical predictors for SVM, using "svm" from library(e1071). If I have multiple categorical predictors, should they just be included as factors? Take a simple artificial data example: x1<-rnorm(500) x2<-rnorm(500) #Categorical Predictor 1, with 5 levels x3<-as.factor(rep(c(1,2,3,4,5) ,c(50,150,130,70,100))) #Catgegorical Predictor 2, with 3 levels x4<-as.factor(rep(c("R","B","G"),c(100,200,200))) #Response y<-rep(c(-1,1),c(275,225)) class<-as.factor(y) svmdata<-cbind(class,x1,x2,x3,x4) mod1<-svm(class~.,data=svmdata,type="C-classification") OR should each factor be coded as an indicator variable? E.g. for categorical predictor 2, since there're 3 levels, we code: (R,R,B,G,G) = ( (1,0,0),(1,0,0),(0,1,0),(0,0,1),(0,0,1) ) There are no errors when I run the model using either method, but I'm unsure which is correct for svm in 'e1071'. Many thanks. V.V. [[alternative HTML version deleted]]
Dear all, I have a question about using categorical predictors for SVM, using "svm" from library(e1071). If I have multiple categorical predictors, should they just be included as factors? Take a simple artificial data example: x1<-rnorm(500) x2<-rnorm(500) #Categorical Predictor 1, with 5 levels x3<-as.factor(rep(c(1,2,3,4,5),c(50,150,130,70,100))) #Catgegorical Predictor 2, with 3 levels x4<-as.factor(rep(c("R","B","G"),c(100,200,200))) #Response y<-rep(c(-1,1),c(275,225)) class<-as.factor(y) svmdata<-cbind(class,x1,x2,x3,x4) mod1<-svm(class~.,data=svmdata,type="C-classification") OR should each factor be coded as an indicator variable? E.g. for categorical predictor 2, since there're 3 levels, we code: (R,R,B,G,G) = ( (1,0,0),(1,0,0),(0,1,0),(0,0,1),(0,0,1) ) There are no errors when I run the model using either method, but I'm unsure which is correct for svm in 'e1071'. Many thanks. V.V.
Dear all, how can I get the exact p-value of a statistical test like cor.test() if the p-value is below the default machine epsilon value of .Machine$double.eps = 2.220446e-16? At the moment smaller p-values are reported as "p-value < 2.2e-16". .Machine$double.eps <- 1E-100 does not solve this issue, although this value should be used by the format.pval() function. To know the exact p-values down to 1E-200 is very important since I have multiple tests which require a alpha error-threshold below 2.2E-16. Thanks in advance, Will
If you believe P-values that small have any meaning at all, I have a bridge to sell you... Bert Gunter Genentech Nonclinical Statistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Will Eagle Sent: Wednesday, May 19, 2010 1:53 AM To: r-help at r-project.org Subject: [R] p-values < 2.2e-16 not reported Dear all, how can I get the exact p-value of a statistical test like cor.test() if the p-value is below the default machine epsilon value of .Machine$double.eps = 2.220446e-16? At the moment smaller p-values are reported as "p-value < 2.2e-16". .Machine$double.eps <- 1E-100 does not solve this issue, although this value should be used by the format.pval() function. To know the exact p-values down to 1E-200 is very important since I have multiple tests which require a alpha error-threshold below 2.2E-16. Thanks in advance, Will ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.