thr3ads.net - R help - [R] memory limits in R loading a dataset and using the package tree [Jan 2007]

If this information is useful, please help other people find it:
Share via:

domenico pestalozzi

2007-Jan-04 16:30 UTC

[R] memory limits in R loading a dataset and using the package tree

? stato filtrato un testo allegato il cui set di caratteri non era
indicato...
Nome: non disponibile
Url:
https://stat.ethz.ch/pipermail/r-help/attachments/20070104/6e94ce08/attachment.pl

Prof Brian Ripley

2007-Jan-04 23:25 UTC

head link

[R] memory limits in R loading a dataset and using the package tree

Please read the rw-FAQ Q2.9.  There are ways to raise the limit, and you 
have not told us that you used them (nor the version of R you used, which 
matters as the limits are version-specific).

Beyond that, there are ways to use read.table more efficiently: see its 
help page and the 'R Data Import/Export' manual.  In particular, did you
set nrows and colClasses?

But for the size of problem you have I would use a 64-bit build of R.


On Thu, 4 Jan 2007, domenico pestalozzi wrote:
> I think the question is discussed in other thread, but I don't exactly
find
> what I want .
> I'm working in Windows XP with 2GB of memory and a Pentium 4 - 3.00Ghx.
> I have the necessity of working with large dataset, generally from 300,000
> records to 800,000 (according to the project), and about 300 variables
> (...but a dataset with 800,000 records could not be "large" in
your
> opinion...). Because of we are deciding if R will be the official software
> in our company, I'd like to say if the possibility of using R with
these
> datasets depends only by the characteristics of the "engine"
(memory and
> processor).
> In this case we can improve the machine (for example, what memory you
> reccomend?).
>
> For example, I have a dataset of 200,000 records and 211 variables but I
> can't load the dataset because R doesn't work : I control the
loading
> procedure (read.table in R) by using the windows task-manager and R is
> blocked when the file paging is 1.10 GB.
> After this I try with a sample of 100,000 records and I can correctly load
> tha dataset, but I'd like to use the package tree, but after some
seconds (
> I use this tree(variable1~., myDataset) )   I obtain the message
"Reached
> total allocation of 1014Mb".
>
> I'd like your opinion and suggestion, considering that I could improve
(in
> memory) my computer.
>
> pestalozzi
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Weiwei Shi

2007-Jan-05 20:11 UTC

head link

[R] memory limits in R loading a dataset and using the package tree

IMHO, R is not good at really large-scale data mining, esp. when the
algorithm is complicated. The alternatives are
1. sampling your data; sometimes you really do not need that large
number of records and the accuracy might already be good enough when
you load less.

2. find an alternative (commercial software) to do this job if you
really need to load all.

3. make a wrapper function, sampling your data and load it into R and
build model and repeat this process until you get n models. Then you
can do like meta-learning or simply majority-win if your problem is
classification.

HTH,

On 1/4/07, domenico pestalozzi <statadat at gmail.com>
wrote:> I think the question is discussed in other thread, but I don't exactly
find
> what I want .
> I'm working in Windows XP with 2GB of memory and a Pentium 4 - 3.00Ghx.
> I have the necessity of working with large dataset, generally from 300,000
> records to 800,000 (according to the project), and about 300 variables
> (...but a dataset with 800,000 records could not be "large" in
your
> opinion...). Because of we are deciding if R will be the official software
> in our company, I'd like to say if the possibility of using R with
these
> datasets depends only by the characteristics of the "engine"
(memory and
> processor).
> In this case we can improve the machine (for example, what memory you
> reccomend?).
>
> For example, I have a dataset of 200,000 records and 211 variables but I
> can't load the dataset because R doesn't work : I control the
loading
> procedure (read.table in R) by using the windows task-manager and R is
> blocked when the file paging is 1.10 GB.
> After this I try with a sample of 100,000 records and I can correctly load
> tha dataset, but I'd like to use the package tree, but after some
seconds (
> I use this tree(variable1~., myDataset) )   I obtain the message
"Reached
> total allocation of 1014Mb".
>
> I'd like your opinion and suggestion, considering that I could improve
(in
> memory) my computer.
>
> pestalozzi
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

Maybe Matching Threads

Search for more maybe matching threads

R help - Jan 2007 - memory limits in R loading a dataset and using the package tree

[R] memory limits in R loading a dataset and using the package tree

[R] memory limits in R loading a dataset and using the package tree

[R] memory limits in R loading a dataset and using the package tree

Maybe Matching Threads