thr3ads.net - R help - [R] Odd timing behaviour in reading a file [Aug 2005]

If this information is useful, please help other people find it:
Share via:

Glenn Stone

2005-Aug-04 03:36 UTC

[R] Odd timing behaviour in reading a file

Hi all,  please don't ask me why I tried this but.......

I have observed some odd behaviour in the time taken to read a file. I 
tried searching the archives without much success, but that could be me.

The first time I read a (60Mb) CSV file, takes a certain amount of time. 
The second time takes appreciably longer and the third and subsequent 
times very much shorter times. See below,

$ R2.1.1

R : Copyright 2005, The R Foundation for Statistical Computing
Version 2.1.1  (2005-06-20), ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for a HTML browser interface to help.
Type 'q()' to quit R.

 > system.time(temp <- 
read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
[1] 32.55  0.30 33.46  0.00  0.00
 > system.time(temp <- 
read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
[1] 45.32  0.24 45.72  0.00  0.00
 > system.time(temp <- 
read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
[1] 11.73  0.17 11.94  0.00  0.00
 > system.time(temp <- 
read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
[1] 8.58 0.28 8.96 0.00 0.00
 > system.time(temp <- 
read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
[1] 8.80 0.16 9.02 0.00 0.00


This is a relatively quiet opteron running redhat linux and using R2.1.1
The same pattern is repeatable, and occurs in R2.0.1 and on a Dell 
laptop running Windows XP.

I guess it is probably something to do with the garbage collector? Can 
anyone explain further? Particularly the first increase.

Thanks....

-- 
Glenn Stone
CSIRO Bioinformatics
http://www.bioinformatics.csiro.au

Prof Brian Ripley

2005-Aug-04 06:15 UTC

head link

[R] Odd timing behaviour in reading a file

Please see the gcFirst argument to system.time, which you should set to 
TRUE for such timings.  Your second run is paying to GC the results of the 
first, most likely.

Beyond that, R adjusts its GC triggers based on usage, and when you first 
start using large objects the trigger levels will grow and generally 
things will speed up.  Set gcinfo(TRUE) to watch what is happening.

On Thu, 4 Aug 2005, Glenn Stone wrote:
> Hi all,  please don't ask me why I tried this but.......
>
> I have observed some odd behaviour in the time taken to read a file. I
> tried searching the archives without much success, but that could be me.
>
> The first time I read a (60Mb) CSV file, takes a certain amount of time.
> The second time takes appreciably longer and the third and subsequent
> times very much shorter times. See below,
>
> $ R2.1.1
>
> R : Copyright 2005, The R Foundation for Statistical Computing
> Version 2.1.1  (2005-06-20), ISBN 3-900051-07-0
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>  Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for a HTML browser interface to help.
> Type 'q()' to quit R.
>
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 32.55  0.30 33.46  0.00  0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 45.32  0.24 45.72  0.00  0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 11.73  0.17 11.94  0.00  0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 8.58 0.28 8.96 0.00 0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 8.80 0.16 9.02 0.00 0.00
>
>
> This is a relatively quiet opteron running redhat linux and using R2.1.1
> The same pattern is repeatable, and occurs in R2.0.1 and on a Dell
> laptop running Windows XP.
>
> I guess it is probably something to do with the garbage collector? Can
> anyone explain further? Particularly the first increase.
>
> Thanks....
>
> -- 
> Glenn Stone
> CSIRO Bioinformatics
> http://www.bioinformatics.csiro.au
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Glenn.Stone@csiro.au

2005-Aug-07 00:03 UTC

head link

[R] Odd timing behaviour in reading a file

Thanks, very helpful.

Is there some way to adjust those GC triggers in advance?

-----Original Message-----
From:	Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent:	Thu 8/4/2005 4:15 PM
To:	Stone, Glenn (CMIS, North Ryde)
Cc:	r-help at stat.math.ethz.ch
Subject:	Re: [R] Odd timing behaviour in reading a file
Please see the gcFirst argument to system.time, which you should set to 
TRUE for such timings.  Your second run is paying to GC the results of the 
first, most likely.

Beyond that, R adjusts its GC triggers based on usage, and when you first 
start using large objects the trigger levels will grow and generally 
things will speed up.  Set gcinfo(TRUE) to watch what is happening.

On Thu, 4 Aug 2005, Glenn Stone wrote:
> Hi all,  please don't ask me why I tried this but.......
>
> I have observed some odd behaviour in the time taken to read a file. I
> tried searching the archives without much success, but that could be me.
>
> The first time I read a (60Mb) CSV file, takes a certain amount of time.
> The second time takes appreciably longer and the third and subsequent
> times very much shorter times. See below,
>
> $ R2.1.1
>
> R : Copyright 2005, The R Foundation for Statistical Computing
> Version 2.1.1  (2005-06-20), ISBN 3-900051-07-0
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>  Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for a HTML browser interface to help.
> Type 'q()' to quit R.
>
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 32.55  0.30 33.46  0.00  0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 45.32  0.24 45.72  0.00  0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 11.73  0.17 11.94  0.00  0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 8.58 0.28 8.96 0.00 0.00
> > system.time(temp <-
> read.csv("Mapping50K_Xba240_annot.csv",header=TRUE, as.is=TRUE))
> [1] 8.80 0.16 9.02 0.00 0.00
>
>
> This is a relatively quiet opteron running redhat linux and using R2.1.1
> The same pattern is repeatable, and occurs in R2.0.1 and on a Dell
> laptop running Windows XP.
>
> I guess it is probably something to do with the garbage collector? Can
> anyone explain further? Particularly the first increase.
>
> Thanks....
>
> -- 
> Glenn Stone
> CSIRO Bioinformatics
> http://www.bioinformatics.csiro.au
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Prof Brian Ripley

2005-Aug-07 05:20 UTC

head link

[R] Odd timing behaviour in reading a file

On Sun, 7 Aug 2005 Glenn.Stone at csiro.au wrote:
> Thanks, very helpful.
>
> Is there some way to adjust those GC triggers in advance?
Almost: that is what the statrtup flags such as --min-vsize help do.
> -----Original Message-----
> From:	Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> Sent:	Thu 8/4/2005 4:15 PM
> To:	Stone, Glenn (CMIS, North Ryde)
> Cc:	r-help at stat.math.ethz.ch
> Subject:	Re: [R] Odd timing behaviour in reading a file
> Please see the gcFirst argument to system.time, which you should set to
> TRUE for such timings.  Your second run is paying to GC the results of the
> first, most likely.
>
> Beyond that, R adjusts its GC triggers based on usage, and when you first
> start using large objects the trigger levels will grow and generally
> things will speed up.  Set gcinfo(TRUE) to watch what is happening.
[...]

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Aug 2005 - Odd timing behaviour in reading a file

[R] Odd timing behaviour in reading a file

[R] Odd timing behaviour in reading a file

[R] Odd timing behaviour in reading a file

[R] Odd timing behaviour in reading a file

Possibly Parallel Threads