mcbride at duke.edu
2006-Mar-30 22:20 UTC
[R] Predict function for 'newdata' of different dimension in svm
I am using the "predict" function on a support vector machine (svm) object, and I don't understand why I can't predict on a dataset with more observations than the training dataset. I think this problem is a generic "predict" problem, but I'm not sure. The original svm was fit on 50 observations. cd1.svm<-svm(boot.dist.dat$Acode~boot.dist.dat$EXT+boot.dist.dat$TOF,cost=100,gamma=20) ## for these training data,> names(boot.dist.dat)[1] "TOF" "EXT" "Acode"> dim(boot.dist.dat)[1] 50 3 Now I want to use the svm classifier on a new dataset with 175 observations: new.dat<-data.frame(TOF=Cd1[cand.adult,]$TOF,EXT=Cd1[cand.adult,]$EXT,Acode=rep(0,175),row.names=NULL) ## for the new dataset,> names(new.dat)[1] "TOF" "EXT" "Acode"> dim(new.dat)[1] 175 3 Now try to predict:> predict(cd1.svm,newdata=new.dat)Error in "names<-.default"(`*tmp*`, value = c("1", "2", "3", "4", "5", : 'names' attribute [175] must be the same length as the vector [50] What am I missing? Why would the row names have to be the same? Thanks so much, Sandra McBride ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sandra McBride Research Scientist Nicholas School of the Environment and Earth Sciences (NSEES) Box 90328 Duke University Levine Science Research Center Durham, NC 27708-0328 (919) 622 3663
David Meyer
2006-Mar-31 21:56 UTC
[R] Predict function for 'newdata' of different dimension in svm
Sandra, hard to tell where the error message originates from without having the data at hand (perhaps you could provide that to me off-list?), but I am almost sure things will work when you train the model the "standard" way: cd1.svm<-svm(Acode~EXT+TOF, data = boot.dist.dat, cost=100, gamma=20) and then do the predictions. Best, David ------------------------- I am using the "predict" function on a support vector machine (svm) object, and I don't understand why I can't predict on a dataset with more observations than the training dataset. I think this problem is a generic "predict" problem, but I'm not sure. The original svm was fit on 50 observations. cd1.svm<-svm(boot.dist.dat$Acode~boot.dist.dat$EXT+boot.dist.dat $TOF,cost=100,gamma=20) ## for these training data,> names(boot.dist.dat)[1] "TOF" "EXT" "Acode"> dim(boot.dist.dat)[1] 50 3 Now I want to use the svm classifier on a new dataset with 175 observations: new.dat<-data.frame(TOF=Cd1[cand.adult,]$TOF,EXT=Cd1[cand.adult,] $EXT,Acode=rep(0,175),row.names=NULL) ## for the new dataset,> names(new.dat)[1] "TOF" "EXT" "Acode"> dim(new.dat)[1] 175 3 Now try to predict:> predict(cd1.svm,newdata=new.dat)Error in "names<-.default"(`*tmp*`, value = c("1", "2", "3", "4", "5", : 'names' attribute [175] must be the same length as the vector [50] What am I missing? Why would the row names have to be the same? Thanks so much, Sandra McBride -- Dr. David Meyer Department of Information Systems and Operations Vienna University of Economics and Business Administration Augasse 2-6, A-1090 Wien, Austria, Europe Tel: +43-1-313 36 4393 Fax: +43-1-313 36 90 4393 HP: http://wi.wu-wien.ac.at/~meyer/