thr3ads.net - R help - [R] Categorical Predictors for SVM (e1071) [May 2010]

If this information is useful, please help other people find it:
Share via:

Vanilla Sky

2010-May-14 13:56 UTC

[R] Categorical Predictors for SVM (e1071)

Dear all,

I have a question about using categorical predictors for SVM, using
"svm"
from library(e1071). If I have multiple categorical predictors, should they
just be included as factors? Take a simple artificial data example:

x1<-rnorm(500)
x2<-rnorm(500)

#Categorical Predictor 1, with 5 levels
x3<-as.factor(rep(c(1,2,3,4,5),c(50,150,130,70,100)))

#Catgegorical Predictor 2, with 3 levels
x4<-as.factor(rep(c("R","B","G"),c(100,200,200)))

#Response
y<-rep(c(-1,1),c(275,225))
class<-as.factor(y)

svmdata<-cbind(class,x1,x2,x3,x4)

mod1<-svm(class~.,data=svmdata,type="C-classification")

OR

should each factor be coded as an indicator variable? E.g. for categorical
predictor 2, since there're 3 levels, we code:

(R,R,B,G,G) = ( (1,0,0),(1,0,0),(0,1,0),(0,0,1),(0,0,1) )

There are no errors when I run the model using either method, but I'm unsure
which is correct for svm in 'e1071'.

Many thanks.

V.V.

	[[alternative HTML version deleted]]

Vanilla Sky

2010-May-14 20:51 UTC

head link

[R] Categorical Predictors for SVM (e1071)

Dear all,

I have a question about using categorical predictors for SVM, using
"svm"
from library(e1071). If I have multiple categorical predictors, should they
just be included as factors? Take a simple artificial data example:

x1<-rnorm(500)
x2<-rnorm(500)

#Categorical Predictor 1, with 5 levels
x3<-as.factor(rep(c(1,2,3,4,5)
,c(50,150,130,70,100)))

#Catgegorical Predictor 2, with 3 levels
x4<-as.factor(rep(c("R","B","G"),c(100,200,200)))

#Response
y<-rep(c(-1,1),c(275,225))
class<-as.factor(y)

svmdata<-cbind(class,x1,x2,x3,x4)

mod1<-svm(class~.,data=svmdata,type="C-classification")

OR

should each factor be coded as an indicator variable? E.g. for categorical
predictor 2, since there're 3 levels, we code:

(R,R,B,G,G) = ( (1,0,0),(1,0,0),(0,1,0),(0,0,1),(0,0,1) )

There are no errors when I run the model using either method, but I'm unsure
which is correct for svm in 'e1071'.

Many thanks.

V.V.

	[[alternative HTML version deleted]]

Vanilla Sky

2010-May-15 05:20 UTC

head link

[R] Categorical Predictors for SVM (e1071)

Dear all,

I have a question about using categorical predictors for SVM, using
"svm" from library(e1071). If I have multiple categorical predictors,
should they just be included as factors? Take a simple artificial data
example:

x1<-rnorm(500)
x2<-rnorm(500)

#Categorical Predictor 1, with 5 levels
x3<-as.factor(rep(c(1,2,3,4,5),c(50,150,130,70,100)))

#Catgegorical Predictor 2, with 3 levels
x4<-as.factor(rep(c("R","B","G"),c(100,200,200)))

#Response
y<-rep(c(-1,1),c(275,225))
class<-as.factor(y)

svmdata<-cbind(class,x1,x2,x3,x4)

mod1<-svm(class~.,data=svmdata,type="C-classification")

OR

should each factor be coded as an indicator variable? E.g. for
categorical predictor 2, since there're 3 levels, we code:

(R,R,B,G,G) = ( (1,0,0),(1,0,0),(0,1,0),(0,0,1),(0,0,1) )

There are no errors when I run the model using either method, but I'm
unsure which is correct for svm in 'e1071'.

Many thanks.

V.V.

Will Eagle

2010-May-19 08:53 UTC

head link

[R] p-values < 2.2e-16 not reported

Dear all,

how can I get the exact p-value of a statistical test like cor.test() if 
the p-value is below the default machine epsilon value of 
.Machine$double.eps =  2.220446e-16?

At the moment smaller p-values are reported as "p-value < 2.2e-16".
.Machine$double.eps <- 1E-100 does not solve this issue, although this 
value should be used by the format.pval() function.

To know the exact p-values down to 1E-200 is very important since I have 
multiple tests which require a alpha error-threshold below 2.2E-16.

Thanks in advance,

Will

Bert Gunter

2010-May-19 15:16 UTC

head link

[R] p-values < 2.2e-16 not reported

If you believe P-values that small have any meaning at all, I have a bridge
to sell you... 


Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Will Eagle
Sent: Wednesday, May 19, 2010 1:53 AM
To: r-help at r-project.org
Subject: [R] p-values < 2.2e-16 not reported

Dear all,

how can I get the exact p-value of a statistical test like cor.test() if 
the p-value is below the default machine epsilon value of 
.Machine$double.eps =  2.220446e-16?

At the moment smaller p-values are reported as "p-value < 2.2e-16".
.Machine$double.eps <- 1E-100 does not solve this issue, although this 
value should be used by the format.pval() function.

To know the exact p-values down to 1E-200 is very important since I have 
multiple tests which require a alpha error-threshold below 2.2E-16.

Thanks in advance,

Will

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - May 2010 - Categorical Predictors for SVM (e1071)

[R] Categorical Predictors for SVM (e1071)

[R] Categorical Predictors for SVM (e1071)

[R] Categorical Predictors for SVM (e1071)

[R] p-values < 2.2e-16 not reported

[R] p-values < 2.2e-16 not reported

Apparently Analagous Threads