Hello,
I have prepared an svm on some training data and would like to use the svm model
for predicting binary outcome from new data.
The input data frame contains several numeric and factor variables. Usually I
construct the input matrix of the entities to be predicted with a perl script
that writes it to a file (since the data comes from different sources and some
text processing is needed). This file is then read read via read.table within R.
It is possible that I'd like to perform prediction on many new cases or on a
single new case.
There are now two problems:
1. If the constructed matrix for the cases to be predicted does not contain all
the factor levels that were used to build the model (the factor levels found the
training set) the svm throws an error (Error in scale ...).
I've tried to factors, but instead of getting the level labes I get the
numeric values:
> tmp <- sapply(11:15, function(i) factor(new.dat[,i],
levels=c('A','C','G','T')))
> tmp
[,1] [,2] [,3] [,4] [,5]
[1,] 3 4 4 2 2
[2,] 4 2 2 1 1
[3,] 2 1 1 1 1
[4,] 1 1 1 1 1
[5,] 1 1 2 1 3
[6,] 2 1 3 4 3
[7,] 3 4 3 3 1
[8,] 3 3 1 4 1
[9,] 1 4 1 1 4
[10,] 1 1 4 4 4
> new.dat[,14]
[1] "C" "A" "A" "A" "A"
"T" "G" "T" "A" "T"
2. When reading a data frame with the variables and factos for a single new case
(one row), read.table always treats the variables as strings (variables and
factors), and worse - one of the factors contains a level named 'T' that
is replaced by TRUE during read.table. I've tried as.is = T and F, and the
result for she single row data frame is the same (T is replaxced by TRUE).
I'm running R 2.1.0.
Any suggestions how to read a data frame (with at least one row) and to treat
factor columns as such, and how to adjust the factor levels before passing the
data frame to predict.svm?
thanks in advance,
+kind regards,
Arne
[[alternative HTML version deleted]]