thr3ads.net - R help - [R] svm of e1071 package [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Shyamasree Saha [shs]

2010-Apr-06 12:07 UTC

[R] svm of e1071 package

Hello List,

I am having a great trouble using svm function in e1071 package. I have 4gb of
data that i want to use to train svm. I am using Amazon cloud, my Amazon Machine
Image(AMI) has 34.2 GB of memory. my R process was killed several times when i
tried to use 4GB of data for svm. Now I am using a subset of that data and it is
only 1.4 GB.  i remove all unnecessary objects before calling svm(). I have
monitored the memory consumption and found that before i call svm() my AMI has
25GB of free memory. after calling svm(), this free memory starts going down and
at the end i have only 1.7 gb of memory and R gives me error that it can not
create vector of size 3.4 gb. Its true that if i do not have enough memory then
how R will create the vector. But my question is how svm function is eating up
that 25gb of memory?? do i have anything to do to solve this problem or its a
problem in e1071 package ? by "problem in e1071 package", i mean does
svm() in e1071 normally consume that high amount of memory? if svm() really
consume this much memory then i have to think of some other way to train svm. if
34gb ram is not enough for 1.4 gb of data then i am in trouble. Amazon has
maximum 68.4gb ram.

Please help. Thanks in advance.

Regards
Shyama

Steve Lianoglou

2010-Apr-06 15:35 UTC

head link

[R] svm of e1071 package

Hi,

On Tue, Apr 6, 2010 at 8:07 AM, Shyamasree Saha [shs] <shs at aber.ac.uk>
wrote:> Hello List,
>
> I am having a great trouble using svm function in e1071 package. I have 4gb
of data that i want to use to train svm. I am using Amazon cloud, my Amazon
Machine Image(AMI) has 34.2 GB of memory. my R process was killed several times
when i tried to use 4GB of data for svm. Now I am using a subset of that data
and it is only 1.4 GB. ?i remove all unnecessary objects before calling svm(). I
have monitored the memory consumption and found that before i call svm() my AMI
has 25GB of free memory. after calling svm(), this free memory starts going down
and at the end i have only 1.7 gb of memory and R gives me error that it can not
create vector of size 3.4 gb. Its true that if i do not have enough memory then
how R will create the vector. But my question is how svm function is eating up
that 25gb of memory?? do i have anything to do to solve this problem or its a
problem in e1071 package ? by "problem in e1071 package", i mean does
svm() in e1071 normally consume that high amount !
> ?of memory? if svm() really consume this much memory then i have to think
of some other way to train svm. if 34gb ram is not enough for 1.4 gb of data
then i am in trouble. Amazon has maximum 68.4gb ram.
I think we need more info regarding your problem.

I'm guessing the answer must be yes since you're chewing up all that
memory, but are you you sure you're running R in 64-bit mode? What do
you get when you type the following in the R console:

R> .Machine$sizeof.pointer ## it should be 8

* What type of kernel are you using? Have you tried different ones?
* Are you doing classification or regression?
* Is your data/feature matrix sparse? If so, are you passing libsvm a
SparseM matrix?
* Have you tried playing with some of the params in the svm call, like
the values for tolerance, epsilon, cost/nu/etc.
* Try an even smaller subset of your data (< 1.4 GB)
* What is the dimensionality of your X matrix -- how many examples,
and how many features does each example have
* Include sessionInfo() -- we don't know what version of R/e1071 etc.
* There is a kernlab package that also implements the svm, try that.
* You can also try to precompute a kernel matrix and send that into
kernlab's ksvm function, maybe that helps?

Don't know, lots of things ... and you didn't provide any code, so
it's hard to figure out what's up.

If your problem is really too huge, there are other svm
implementations you might consider looking into, such as Pegasos SVM,
liblienar, svm^perf, etc., depending on the problem you're trying to
solve.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Saeed Abu Nimeh

2010-Apr-06 17:34 UTC

head link

[R] svm of e1071 package

I think the problem is that you have R configured as 32-bits. If that
is the case, then you will only have access to 4 gigs of RAM (see
http://www.brianmadden.com/blogs/brianmadden/archive/2004/02/19/the-4gb-windows-memory-limit-what-does-it-really-mean.aspx).
Try booting up an ubuntu instance in the cloud and then install R
using the 64-bit configuration. I am interested to know if this solves
the problem. Let me know.
Thanks,
Saeed

On Tue, Apr 6, 2010 at 5:07 AM, Shyamasree Saha [shs] <shs at aber.ac.uk>
wrote:> Hello List,
>
> I am having a great trouble using svm function in e1071 package. I have 4gb
of data that i want to use to train svm. I am using Amazon cloud, my Amazon
Machine Image(AMI) has 34.2 GB of memory. my R process was killed several times
when i tried to use 4GB of data for svm. Now I am using a subset of that data
and it is only 1.4 GB. ?i remove all unnecessary objects before calling svm(). I
have monitored the memory consumption and found that before i call svm() my AMI
has 25GB of free memory. after calling svm(), this free memory starts going down
and at the end i have only 1.7 gb of memory and R gives me error that it can not
create vector of size 3.4 gb. Its true that if i do not have enough memory then
how R will create the vector. But my question is how svm function is eating up
that 25gb of memory?? do i have anything to do to solve this problem or its a
problem in e1071 package ? by "problem in e1071 package", i mean does
svm() in e1071 normally consume that high amount !
> ?of memory? if svm() really consume this much memory then i have to think
of some other way to train svm. if 34gb ram is not enough for 1.4 gb of data
then i am in trouble. Amazon has maximum 68.4gb ram.
>
> Please help. Thanks in advance.
>
> Regards
> Shyama
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Steve Lianoglou

2010-Apr-06 20:40 UTC

head link

[R] svm of e1071 package

Hi Shyama,

Don't forget to CC the r-help list in your discussions so that there
are more eyes on this problem, and others might potentially benefit
from discussion.

Comments in line.

On Tue, Apr 6, 2010 at 4:06 PM, Shyamasree Saha [shs] <shs at aber.ac.uk>
wrote:> Dear Steve,
>
> Thanks a lot for your reply. As you have suggested kernlab and SparseM
packages, we have now installed it and reading about these packages. I am trying
to answer your questions. I have also added a bit of code. Please let me know
whether you need to know more and what is your suggestions.
>
> Thanks again for your help.
>
> Regards,
> Shyamasree
>
> R> .Machine$sizeof.pointer ## it should be 8
> Yes, it is indeed 8.
OK
>> * What type of kernel are you using? Have you tried different ones?
> Just tried the linear kernel, haven't tried with other kernels.
OK, let's stick with that for now.
>> * Are you doing classification or regression?
> We are doing multi-class classification. There are 11 classes.
Is it any better if you just do 1-vs-all?
Also (from your code at the end of the email) what if you train the
model with `probability=FALSE`?
>> * Is your data/feature matrix sparse? If so, are you passing libsvm a
>> SparseM matrix?
> Yes, the feature matrix is indeed very sparse. Just passing a matrix
> at the moment.
> Not sure how to define it as SparseM matrix.
R> library(SparseM)
R> ?as.matrix.csr
>> * Have you tried playing with some of the params in the svm call, like
>> the values for tolerance, epsilon, cost/nu/etc.
> No, have not played with these at all. What do you recommend?
Try to increase (I think (maybe decrease??)) the tolerance from its
default value. Moving this in one direction or the other allows the
solver to converge to a less-precise solution -- haven't read the
source in a while, though, so test it.
>> * Try an even smaller subset of your data (< 1.4 GB)
> It works fine with a much smaller subset but have not tried with
> intermediate sizes.
OK

Can you give an idea of how long it takes for your call to `svm` to
return with different data sizes?
How does its memory stats look like?
>> * What is the dimensionality of your X matrix -- how many examples,
>> and how many features does each example have
> X matrix dimensionality: 35,500 rows x 52,058 cols . All features are
> binary.
I think that's quite large.

This might be a good reason to try liblinear as it is more appropriate
for large feature spaces and is made by the same libsvm folks:

http://www.csie.ntu.edu.tw/~cjlin/liblinear/
http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf
>> * Include sessionInfo() -- we don't know what version of R/e1071
etc.
> R version 2.10.1 (2009-12-14)
> e1071 ? ? ?"1.5-23"
> running on
> Linux version 2.6.32-303-ec2 (buildd at crested) (gcc version 4.4.3
> (Ubuntu 4.4.3-3ubuntu1) ) #7-Ubuntu SMP Wed Mar 10 11:23:24 UTC 2010
> on an m2.2xlarge amazon instance with
> 34.2 GB of memory, 13 EC2 Compute Units (4 virtual cores with 3.25 EC2
> Compute Units each)
>
>> * There is a kernlab package that also implements the svm, try that.
> Thanks. Does kernlab implement libsvm as well? What is the difference
> between the two packages?
libsvm is at the core of kernlab as well, but it's used a bit differently
>> * You can also try to precompute a kernel matrix and send that into
>> kernlab's ksvm function, maybe that helps?
> Any staring tips for this?
R> library(kernlab)
R> ?kernelMatrix
>> Don't know, lots of things ... and you didn't provide any code,
so
>> it's hard to figure out what's up.
>>
>> If your problem is really too huge, there are other svm
>> implementations you might consider looking into, such as Pegasos SVM,
>> liblienar, svm^perf, etc., depending on the problem you're trying
to
>> solve.
> Which of these do you recommend for the problem at hand and the size
> of the matrix
As mentioned above, you can try liblinear. There is no R wrapper, so
you can either write out the input files and run liblinear/train from
the command line, or you can try one of the wrappers from another
language (maybe you're familiar with Python?)

I reckon it wouldn't hurt for someone to make an R wrapper for
liblinear, though ...
>
>
>
> code:::
>
> svm_learn <- function(pClass){
> sink(logfile,append=T)
> print("In svm_learn function")
> sink(NULL)
> multi.svm<-svm(x=as.matrix(ycln[idxtrn, ]), y=as.factor(pClass)[idxtrn],
kernel='linear', probability=T)
>
> summary(multi.svm)
>
> # do prediction
> svmpredtrn<-predict(multi.svm,newdata=as.matrix(ycln[idxtrn, ]),
decision.values=T)
> svmpredtst<-predict(multi.svm,newdata=as.matrix(ycln[idxtst, ]),
decision.values=T)
>
> # Check accuracy for training data:
>
>
> # Check accuracy for testing data:
>
> print("Finished svm_learn function")
> list(tabtrn=table(pClass[idxtrn],svmpredtrn),
tabtst=table(pClass[idxtst],svmpredtst))
> }
>
>
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>


-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Maybe Matching Threads

Search for more seemingly similar threads

R help - Apr 2010 - svm of e1071 package

[R] svm of e1071 package

[R] svm of e1071 package

[R] svm of e1071 package

[R] svm of e1071 package

Maybe Matching Threads