Hi,
On Thu, Jun 24, 2010 at 1:22 PM, Changbin Du <changbind at gmail.com>
wrote:> HI, GUYS,
>
> I used the following codes to run SVM and get prediction on new data set
hh.
>
> ?dim(all_h)
> [1] 2034 ? 24
> ?dim(hh) ? ?# it contains all the variables besides the variables in all_h
> data set.
> [1] 640 415
If I understand you correctly, this is wrong.
You are supposed to hold out *observations* (rows) when doing
training/testing, not variables/predictors/features (cols).
Let's assume that e1071::svm doesn't do anything fancy with matching
column names between training/testing, then to put this simply: the
number of columns (features per observation) you are using in training
should be the same number of columns you have in your test set.
-steve
> require(e1071)
>
> svm.tune<-tune(svm, as.factor(out) ~ ., data=all_h,
> ranges=list(gamma=2^(-5:5), cost=2^(-5:5)))# find the best parameters.
>
> bestg<-svm.tune$best.parameters[[1]]
> bestc<-svm.tune$best.parameters[[2]]
>
> svm.fit<-svm(as.factor(out) ~ ., data=all_h,
method="C-classification",
> kernel="radial", probability = TRUE, cost=bestc, gamma=bestg,
cross=10) #
> model fitting
>
> svm.pred<-predict(svm.fit, hh, decision.values = TRUE, probability =
TRUE) #
> find the probability.
> *
> Error in matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames >
list(rowns, ?:
> ?invalid 'ncol' value (too large or NA)*
>
>
>> head(all_h)
> ? ? ? DD ? ?HK HQ ? ? ?IL ? ? ?LP ? ? ? ? ?NE ? ? ? ? ?NP
> TA ? ? ? ? ?TP ? ? ? ? ? ?WA ? ? ?WC
> 1 0.00543 ?0 ?0 0.00815 0.00272 0.00543 0.00000 0.00000 0.00000 0.00000 ?0
> 3 0.00000 ?0 ?0 0.00890 0.00890 0.00712 0.00534 0.00000 0.00890 0.00178 ?0
> 4 0.00448 ?0 ?0 0.00448 0.00299 0.00448 0.00149 0.00299 0.00000 0.00149 ?0
> 5 0.00312 ?0 ?0 0.00467 0.00467 0.00000 0.00156 0.00467 0.00312 0.00467 ?0
> 6 0.00587 ?0 ?0 0.02053 0.00587 0.00000 0.00293 0.00587 0.00293 0.00000 ?0
> 7 0.00000 ?0 ?0 0.02422 0.00346 0.00000 0.00346 0.00346 0.00000 0.00346 ?0
> ? ? ? WD ? ? ?WG ? ? ?WN ? ? ? ? ? ? ?YW ? ? ? ?acid_per
> base_per ?charge_per
> 1 0.00000 0.00000 0.00000 0.00000 0.14402174 0.12228261 0.019021739
> 3 0.00178 0.00178 0.00534 0.00178 0.12277580 0.09252669 0.016014235
> 4 0.00149 0.00448 0.00448 0.00000 0.16591928 0.11509716 0.022421525
> 5 0.00000 0.00156 0.00000 0.00156 0.13084112 0.10903427 0.009345794
> 6 0.00293 0.00000 0.00000 0.00000 0.07038123 0.08797654 0.002932551
> 7 0.00000 0.00346 0.00000 0.00346 0.05536332 0.08650519 0.010380623
> ?hydrophob_per polar_per num_cell num_genes position ? ? ? ? ? ? out
> 1 ? ? 0.3804348 0.1929348 ? ? ? ?1 ? ? ? ? 4 ? ? ? ?1 ? 0
> 3 ? ? 0.3540925 0.2508897 ? ? ? ?1 ? ? ? ? 4 ? ? ? ?3 ? 0
> 4 ? ? 0.3393124 0.2032885 ? ? ? ?1 ? ? ? ? 4 ? ? ? ?4 ? 1
> 5 ? ? 0.3753894 0.2305296 ? ? ? ?2 ? ? ? ? 7 ? ? ? ?1 ? 0
> 6 ? ? 0.4868035 0.1964809 ? ? ? ?2 ? ? ? ? 7 ? ? ? ?2 ? 0
> 7 ? ? 0.4878893 0.1522491 ? ? ? ?2 ? ? ? ? 7 ? ? ? ?3 ? 0
>
>> quantile(hh$HK)
> ? ? 0% ? ? 25% ? ? 50% ? ? 75% ? ?100%
> 0.00000 0.00000 0.00000 0.00000 0.02703
>> quantile(hh$HQ)
> ? 0% ? 25% ? 50% ? 75% ?100%
> 0.000 0.000 0.000 0.000 0.025
>> quantile(hh$WC)
> ? ? 0% ? ? 25% ? ? 50% ? ? 75% ? ?100%
> 0.00000 0.00000 0.00000 0.00000 0.01266
>
> Can someone give some suggestions?
>
> Thanks!
>
>
>
>
>
> --
> Sincerely,
> Changbin
> --
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact