Hello List, I am having a great trouble using svm function in e1071 package. I have 4gb of data that i want to use to train svm. I am using Amazon cloud, my Amazon Machine Image(AMI) has 34.2 GB of memory. my R process was killed several times when i tried to use 4GB of data for svm. Now I am using a subset of that data and it is only 1.4 GB. i remove all unnecessary objects before calling svm(). I have monitored the memory consumption and found that before i call svm() my AMI has 25GB of free memory. after calling svm(), this free memory starts going down and at the end i have only 1.7 gb of memory and R gives me error that it can not create vector of size 3.4 gb. Its true that if i do not have enough memory then how R will create the vector. But my question is how svm function is eating up that 25gb of memory?? do i have anything to do to solve this problem or its a problem in e1071 package ? by "problem in e1071 package", i mean does svm() in e1071 normally consume that high amount of memory? if svm() really consume this much memory then i have to think of some other way to train svm. if 34gb ram is not enough for 1.4 gb of data then i am in trouble. Amazon has maximum 68.4gb ram. Please help. Thanks in advance. Regards Shyama
Hi, On Tue, Apr 6, 2010 at 8:07 AM, Shyamasree Saha [shs] <shs at aber.ac.uk> wrote:> Hello List, > > I am having a great trouble using svm function in e1071 package. I have 4gb of data that i want to use to train svm. I am using Amazon cloud, my Amazon Machine Image(AMI) has 34.2 GB of memory. my R process was killed several times when i tried to use 4GB of data for svm. Now I am using a subset of that data and it is only 1.4 GB. ?i remove all unnecessary objects before calling svm(). I have monitored the memory consumption and found that before i call svm() my AMI has 25GB of free memory. after calling svm(), this free memory starts going down and at the end i have only 1.7 gb of memory and R gives me error that it can not create vector of size 3.4 gb. Its true that if i do not have enough memory then how R will create the vector. But my question is how svm function is eating up that 25gb of memory?? do i have anything to do to solve this problem or its a problem in e1071 package ? by "problem in e1071 package", i mean does svm() in e1071 normally consume that high amount ! > ?of memory? if svm() really consume this much memory then i have to think of some other way to train svm. if 34gb ram is not enough for 1.4 gb of data then i am in trouble. Amazon has maximum 68.4gb ram.I think we need more info regarding your problem. I'm guessing the answer must be yes since you're chewing up all that memory, but are you you sure you're running R in 64-bit mode? What do you get when you type the following in the R console: R> .Machine$sizeof.pointer ## it should be 8 * What type of kernel are you using? Have you tried different ones? * Are you doing classification or regression? * Is your data/feature matrix sparse? If so, are you passing libsvm a SparseM matrix? * Have you tried playing with some of the params in the svm call, like the values for tolerance, epsilon, cost/nu/etc. * Try an even smaller subset of your data (< 1.4 GB) * What is the dimensionality of your X matrix -- how many examples, and how many features does each example have * Include sessionInfo() -- we don't know what version of R/e1071 etc. * There is a kernlab package that also implements the svm, try that. * You can also try to precompute a kernel matrix and send that into kernlab's ksvm function, maybe that helps? Don't know, lots of things ... and you didn't provide any code, so it's hard to figure out what's up. If your problem is really too huge, there are other svm implementations you might consider looking into, such as Pegasos SVM, liblienar, svm^perf, etc., depending on the problem you're trying to solve. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
I think the problem is that you have R configured as 32-bits. If that is the case, then you will only have access to 4 gigs of RAM (see http://www.brianmadden.com/blogs/brianmadden/archive/2004/02/19/the-4gb-windows-memory-limit-what-does-it-really-mean.aspx). Try booting up an ubuntu instance in the cloud and then install R using the 64-bit configuration. I am interested to know if this solves the problem. Let me know. Thanks, Saeed On Tue, Apr 6, 2010 at 5:07 AM, Shyamasree Saha [shs] <shs at aber.ac.uk> wrote:> Hello List, > > I am having a great trouble using svm function in e1071 package. I have 4gb of data that i want to use to train svm. I am using Amazon cloud, my Amazon Machine Image(AMI) has 34.2 GB of memory. my R process was killed several times when i tried to use 4GB of data for svm. Now I am using a subset of that data and it is only 1.4 GB. ?i remove all unnecessary objects before calling svm(). I have monitored the memory consumption and found that before i call svm() my AMI has 25GB of free memory. after calling svm(), this free memory starts going down and at the end i have only 1.7 gb of memory and R gives me error that it can not create vector of size 3.4 gb. Its true that if i do not have enough memory then how R will create the vector. But my question is how svm function is eating up that 25gb of memory?? do i have anything to do to solve this problem or its a problem in e1071 package ? by "problem in e1071 package", i mean does svm() in e1071 normally consume that high amount ! > ?of memory? if svm() really consume this much memory then i have to think of some other way to train svm. if 34gb ram is not enough for 1.4 gb of data then i am in trouble. Amazon has maximum 68.4gb ram. > > Please help. Thanks in advance. > > Regards > Shyama > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Shyama, Don't forget to CC the r-help list in your discussions so that there are more eyes on this problem, and others might potentially benefit from discussion. Comments in line. On Tue, Apr 6, 2010 at 4:06 PM, Shyamasree Saha [shs] <shs at aber.ac.uk> wrote:> Dear Steve, > > Thanks a lot for your reply. As you have suggested kernlab and SparseM packages, we have now installed it and reading about these packages. I am trying to answer your questions. I have also added a bit of code. Please let me know whether you need to know more and what is your suggestions. > > Thanks again for your help. > > Regards, > Shyamasree > > R> .Machine$sizeof.pointer ## it should be 8 > Yes, it is indeed 8.OK>> * What type of kernel are you using? Have you tried different ones? > Just tried the linear kernel, haven't tried with other kernels.OK, let's stick with that for now.>> * Are you doing classification or regression? > We are doing multi-class classification. There are 11 classes.Is it any better if you just do 1-vs-all? Also (from your code at the end of the email) what if you train the model with `probability=FALSE`?>> * Is your data/feature matrix sparse? If so, are you passing libsvm a >> SparseM matrix? > Yes, the feature matrix is indeed very sparse. Just passing a matrix > at the moment. > Not sure how to define it as SparseM matrix.R> library(SparseM) R> ?as.matrix.csr>> * Have you tried playing with some of the params in the svm call, like >> the values for tolerance, epsilon, cost/nu/etc. > No, have not played with these at all. What do you recommend?Try to increase (I think (maybe decrease??)) the tolerance from its default value. Moving this in one direction or the other allows the solver to converge to a less-precise solution -- haven't read the source in a while, though, so test it.>> * Try an even smaller subset of your data (< 1.4 GB) > It works fine with a much smaller subset but have not tried with > intermediate sizes.OK Can you give an idea of how long it takes for your call to `svm` to return with different data sizes? How does its memory stats look like?>> * What is the dimensionality of your X matrix -- how many examples, >> and how many features does each example have > X matrix dimensionality: 35,500 rows x 52,058 cols . All features are > binary.I think that's quite large. This might be a good reason to try liblinear as it is more appropriate for large feature spaces and is made by the same libsvm folks: http://www.csie.ntu.edu.tw/~cjlin/liblinear/ http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf>> * Include sessionInfo() -- we don't know what version of R/e1071 etc. > R version 2.10.1 (2009-12-14) > e1071 ? ? ?"1.5-23" > running on > Linux version 2.6.32-303-ec2 (buildd at crested) (gcc version 4.4.3 > (Ubuntu 4.4.3-3ubuntu1) ) #7-Ubuntu SMP Wed Mar 10 11:23:24 UTC 2010 > on an m2.2xlarge amazon instance with > 34.2 GB of memory, 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 > Compute Units each) > >> * There is a kernlab package that also implements the svm, try that. > Thanks. Does kernlab implement libsvm as well? What is the difference > between the two packages?libsvm is at the core of kernlab as well, but it's used a bit differently>> * You can also try to precompute a kernel matrix and send that into >> kernlab's ksvm function, maybe that helps? > Any staring tips for this?R> library(kernlab) R> ?kernelMatrix>> Don't know, lots of things ... and you didn't provide any code, so >> it's hard to figure out what's up. >> >> If your problem is really too huge, there are other svm >> implementations you might consider looking into, such as Pegasos SVM, >> liblienar, svm^perf, etc., depending on the problem you're trying to >> solve. > Which of these do you recommend for the problem at hand and the size > of the matrixAs mentioned above, you can try liblinear. There is no R wrapper, so you can either write out the input files and run liblinear/train from the command line, or you can try one of the wrappers from another language (maybe you're familiar with Python?) I reckon it wouldn't hurt for someone to make an R wrapper for liblinear, though ...> > > > code::: > > svm_learn <- function(pClass){ > sink(logfile,append=T) > print("In svm_learn function") > sink(NULL) > multi.svm<-svm(x=as.matrix(ycln[idxtrn, ]), y=as.factor(pClass)[idxtrn], kernel='linear', probability=T) > > summary(multi.svm) > > # do prediction > svmpredtrn<-predict(multi.svm,newdata=as.matrix(ycln[idxtrn, ]), decision.values=T) > svmpredtst<-predict(multi.svm,newdata=as.matrix(ycln[idxtst, ]), decision.values=T) > > # Check accuracy for training data: > > > # Check accuracy for testing data: > > print("Finished svm_learn function") > list(tabtrn=table(pClass[idxtrn],svmpredtrn), tabtst=table(pClass[idxtst],svmpredtst)) > } > > >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >-- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact