Martin Tomko
2010-Oct-21 13:42 UTC
[R] SVM classification based on pairwise distance matrix
Dear all, I am exploring the possibilities for automated classification of my data. I have successfully used KNN, but was thinking about looking at SVM (which I did nto use before). I have a pairwise distance matrix of training observations which are classified in set classes, and a distance matrix of new observations to the training ones. Is it possible to use distance matrices for SVM, and if yes, which package would do so (e1071 ? ). I have little experience with SVM, and I had the impression that it is a/ usually used with data taht have observations in terms of a number of variables (hence, not pariwise distances); b/ it is not well suited for large multidimensional spaces (I have a distance matrix of 200*200 observations, a part of this could be used as training data, but still, we are looking at say 50 distances per observation). Thanks Martin [[alternative HTML version deleted]]
Steve Lianoglou
2010-Oct-21 15:42 UTC
[R] SVM classification based on pairwise distance matrix
Hi, On Thu, Oct 21, 2010 at 9:42 AM, Martin Tomko <martin.tomko at geo.uzh.ch> wrote:> Dear all, > I am exploring the possibilities for automated classification of my > data. I have successfully used KNN, but was thinking about looking at > SVM (which I did nto use before). > I have a pairwise distance matrix of training observations which are > classified in set classes, and a distance matrix of new observations to > the ?training ones.It seems to me that since you have some pairwise distance metric, your original data is in some "vector form". Why not just try using your original data (forget the pairwsise distance for now) and try a few different kernels for the svm, such as a linear kernel or an rbf/gaussian.> Is it possible to use distance matrices for SVM, and if yes, which > package would do so (e1071 ? ).I guess you can think of a "kernel matrix" as something like a distance matrix -- actually, it's more like a similarity matrix. I don't recall if e1071 allows you to use kernel matrix as input, but I'm pretty sure the svm functions from kernlab do. It was a pain to use, though. But anyway -- don't use your distance matrix :-)> I have little experience with SVM, and I had the impression that it is > a/ usually used with data taht have observations in terms of a number of > variables (hence, not pariwise distances);With the exception of "plugging in" a kernel matrix (which was calculated from data in its original feature space) that's pretty much correct.> b/ it is not well suited for large multidimensional spaces (I have a > distance matrix of 200*200 observations, a part of this could be used as > training data, but still, we are looking at say 50 distances per > observation).But your distance matrix isn't really the same multidemensional space your data lives in, right? Anyway, like I said before, try the SVM on your original data with some different kernels. I think the RBF kernel should be closest in spirit to your distance matrix, and will likely perform better than your kNN ;-). Hope that helps, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact