thr3ads.net - R help - [R] classification for huge datasets: SVM yields memory troubles [Dec 2004]

If this information is useful, please help other people find it:
Share via:

Christoph Lehmann

2004-Dec-13 12:27 UTC

[R] classification for huge datasets: SVM yields memory troubles

Hi
I have a matrix with 30 observations and roughly 30000 variables, each 
obs belongs to one of two groups. With svm and slda I get into memory 
troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs 
fine. Are there any way to use the memory issue withe SVM's? Or can you 
recommend any other classification method for such huge datasets?


P.S. I run suse 9.1 on a 2G RAM PIV machine.
thanks for a hint

Christoph

Andreas

2004-Dec-13 20:56 UTC

head link

[R] classification for huge datasets: SVM yields memory troubles

Hi,

I'm a beginner in the SVM-module but I have seen there is a parameter called
:
cachesize #cache memory in MB (default 40)

please let me know if this parameter solved your problem, I might get the
same number of samples in the near future.

regards Andreas

"Christoph Lehmann" <christoph.lehmann at gmx.ch> schrieb im
Newsbeitrag
news:41BD8A9F.4040509 at gmx.ch...> Hi
> I have a matrix with 30 observations and roughly 30000 variables, each
> obs belongs to one of two groups. With svm and slda I get into memory
> troubles ('cannot allocate vector of size' roughly 2G). PCA LDA
runs
> fine. Are there any way to use the memory issue withe SVM's? Or can you
> recommend any other classification method for such huge datasets?
>
>
> P.S. I run suse 9.1 on a 2G RAM PIV machine.
> thanks for a hint
>
> Christoph
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html>

John Maindonald

2004-Dec-14 23:59 UTC

head link

[R] classification for huge datasets: SVM yields memory troubles

While it is true that the large number of variables relative to
the number of observations restricts what can be inferred,
the situation is not as hopeless as Bert seems to suggest.
If it were, attempts at the analysis of expression array data
would be a waste to time.  Methods developed to that
general area may well be relevant to other data where the
number of variables is similarly far larger than the number
of observations.

See Ambroise, C. and Mclachlan, G.J. 2002.  Selection bias
in gene extraction on the basis of microarray gene-expression
data.  PNAS 99: 6562--6566.

This discusses some of the literature on the use of SVMs.

The selection bias that these authors discuss also affects
plots, even principal components and other ordination-base
plots where features have been selected on the basis of their
ability to separate into known groups.  I have draft versions
of code that addresses this selection bias as it affects the
plotting of graphs, which (along a paper that has been
submitted for inclusion in a conference proceedings) I am
happy to make available to anyone who wants to experiment.

Another good place to look, as a starting point, may be
Gordon Smyth's LIMMA User's Guide.  This can be a bit
hard to find. With limma installed, type help.start().
After some time a browser window should open. Click on
Packages | limma | Overview | LIMMA User's Guide (pdf)

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 14 Dec 2004, at 10:09 PM, r-help-request at stat.math.ethz.ch wrote:
> From: Berton Gunter <gunter.berton at gene.com>
> Date: 14 December 2004 9:23:08 AM
> To: "'Andreas'" <wolf.privat at gmx.de>, <r-help
at stat.math.ethz.ch>
> Cc: Subject: RE: [R] classification for huge datasets: SVM yields 
> memory troubles
>
>
> " I have a matrix with 30 observations and roughly 30000
> variables, ... <snipped>"
>
> Comment: This is ** not ** a "huge" data set -- it is a tiny one
with a
> large number of covariates. The difference is: If it were truly huge, 
> SVM
> and/or LDA or ... might actually be able to produce useful results. 
> With so
> few data and so many variables, it is hard to see how any approach 
> that one
> uses is not simply a fancy random number generator.
>John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Dec 2004 - classification for huge datasets: SVM yields memory troubles

[R] classification for huge datasets: SVM yields memory troubles

[R] classification for huge datasets: SVM yields memory troubles

[R] classification for huge datasets: SVM yields memory troubles

Reasonably Related Threads