thr3ads.net - R devel - [Rd] R's IO speed [Dec 2004]

If this information is useful, please help other people find it:
Share via:

Prof Brian Ripley

2004-Dec-26 11:03 UTC

[Rd] R's IO speed

R-devel now has some improved versions of read.table and write.table.

For a million-row data frame containing one number, one factor with few 
levels and one logical column, a 56Mb object.

generating it takes 4.5 secs.

calling summary() on it takes 2.2 secs.

writing it takes 8 secs and an additional 10Mb.

saving it in .rda format takes 4 secs.

reading it naively takes 28 secs and an additional 240Mb

reading it carefully (using nrows, colClasses and comment.char) takes 16 
secs and an additional 150Mb (56Mb of which is for the object read in).
(The overhead of read.table over scan was about 2 secs, mainly in the 
conversion back to a factor.)

loading from .rda format takes 3.4 secs.

[R 2.0.1 read in 23 secs using an additional 210Mb, and wrote in 50 secs 
using an additional 450Mb.]


Will Frank Harrell or someone else please explain to me a real application 
in which this is not fast enough?

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Martin Maechler

2004-Dec-26 12:34 UTC

head link

[Rd] R's IO speed

>>>>> "BDR" == Prof Brian Ripley
<ripley@stats.ox.ac.uk>
>>>>>     on Sun, 26 Dec 2004 10:03:30 +0000 (GMT) writes:
    BDR> R-devel now has some improved versions of read.table
    BDR> and write.table.  For a million-row data frame
    BDR> containing one number, one factor with few levels and
    BDR> one logical column, a 56Mb object.

    BDR> generating it takes 4.5 secs.

    BDR> calling summary() on it takes 2.2 secs.

    BDR> writing it takes 8 secs and an additional 10Mb.

    BDR> saving it in .rda format takes 4 secs.

    BDR> reading it naively takes 28 secs and an additional
    BDR> 240Mb

    BDR> reading it carefully (using nrows, colClasses and
    BDR> comment.char) takes 16 secs and an additional 150Mb
    BDR> (56Mb of which is for the object read in).  (The
    BDR> overhead of read.table over scan was about 2 secs,
    BDR> mainly in the conversion back to a factor.)

    BDR> loading from .rda format takes 3.4 secs.

    BDR> [R 2.0.1 read in 23 secs using an additional 210Mb, and
    BDR> wrote in 50 secs using an additional 450Mb.]


Excellent!
Thanks a lot Brian (for this and much more)!

I wish you continued merry holidays!
Martin

Frank E Harrell Jr

2004-Dec-26 15:05 UTC

head link

[Rd] R's IO speed

R-devel now has some improved versions of read.table and write.table.

For a million-row data frame containing one number, one factor with few
Brian Ripley wrote:

levels and one logical column, a 56Mb object.

generating it takes 4.5 secs.

calling summary() on it takes 2.2 secs.

writing it takes 8 secs and an additional 10Mb.

saving it in .rda format takes 4 secs.

reading it naively takes 28 secs and an additional 240Mb

reading it carefully (using nrows, colClasses and comment.char) takes 16
secs and an additional 150Mb (56Mb of which is for the object read in).
(The overhead of read.table over scan was about 2 secs, mainly in the
conversion back to a factor.)

loading from .rda format takes 3.4 secs.

[R 2.0.1 read in 23 secs using an additional 210Mb, and wrote in 50 secs
using an additional 450Mb.]


Will Frank Harrell or someone else please explain to me a real application
in which this is not fast enough?
---------------------------------------------------------------------------

Brian - I really appreciate your work on this, and the data.  The wise 
use of read.table that you mentioned should be fine for almost 
everything I do.  There may be other users who need to read larger 
datasets for which memory usage is an issue.  They can speak for 
themselves though.

Sincerely,

Frank
-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Roger D. Peng

2004-Dec-31 23:57 UTC

head link

[Rd] R's IO speed

On a ~1.45 million row x 122 column data frame (one "character", one
"factor",
and the rest "numeric" columns) I can read it into R 2.0.1 using
read.csv() in
about 150 seconds; memory usage is ~1.5 GB.  This is read in using the
`nrows',
`comment.char = ""', and `colClasses' arguments.  On R-devel
(2004-12-31), it
takes about 120 seconds; memory usage is the same.   Not too shabby!

-roger

Prof Brian Ripley wrote:> R-devel now has some improved versions of read.table and write.table.
> 
> For a million-row data frame containing one number, one factor with few 
> levels and one logical column, a 56Mb object.
> 
> generating it takes 4.5 secs.
> 
> calling summary() on it takes 2.2 secs.
> 
> writing it takes 8 secs and an additional 10Mb.
> 
> saving it in .rda format takes 4 secs.
> 
> reading it naively takes 28 secs and an additional 240Mb
> 
> reading it carefully (using nrows, colClasses and comment.char) takes 16 
> secs and an additional 150Mb (56Mb of which is for the object read in).
> (The overhead of read.table over scan was about 2 secs, mainly in the 
> conversion back to a factor.)
> 
> loading from .rda format takes 3.4 secs.
> 
> [R 2.0.1 read in 23 secs using an additional 210Mb, and wrote in 50 secs 
> using an additional 450Mb.]
> 
> 
> Will Frank Harrell or someone else please explain to me a real 
> application in which this is not fast enough?
>

Vadim Ogranovich

2005-Jan-03 22:02 UTC

head link

[Rd] R's IO speed

A technical question here: how does one measure the memory overhead
mentioned below? I have a set of functions of my own and would like to
profile them.

Thanks,
Vadim 
> -----Original Message-----
> From: r-devel-bounces@stat.math.ethz.ch 
> [mailto:r-devel-bounces@stat.math.ethz.ch] On Behalf Of Prof 
> Brian Ripley
> Sent: Sunday, December 26, 2004 2:04 AM
> To: R-devel@r-project.org
> Subject: [Rd] R's IO speed
> 
> R-devel now has some improved versions of read.table and write.table.
> 
> For a million-row data frame containing one number, one 
> factor with few levels and one logical column, a 56Mb object.
> 
> generating it takes 4.5 secs.
> 
> calling summary() on it takes 2.2 secs.
> 
> writing it takes 8 secs and an additional 10Mb.
> 
> saving it in .rda format takes 4 secs.
> 
> reading it naively takes 28 secs and an additional 240Mb
> 
> reading it carefully (using nrows, colClasses and 
> comment.char) takes 16 secs and an additional 150Mb (56Mb of 
> which is for the object read in).
> (The overhead of read.table over scan was about 2 secs, 
> mainly in the conversion back to a factor.)
> 
> loading from .rda format takes 3.4 secs.
> 
> [R 2.0.1 read in 23 secs using an additional 210Mb, and wrote 
> in 50 secs using an additional 450Mb.]
> 
> 
> Will Frank Harrell or someone else please explain to me a 
> real application 
> in which this is not fast enough?
> 
> -- 
> Brian D. Ripley,                  ripley@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Maybe Matching Threads

Search for more maybe matching threads

R devel - Dec 2004 - R's IO speed

[Rd] R's IO speed

[Rd] R's IO speed

[Rd] R's IO speed

[Rd] R's IO speed

[Rd] R's IO speed

Maybe Matching Threads