On 20/06/2014 15:37, Ista Zahn wrote:> Hello,
>
> I've noticed that dget() is much slower in the current and devel R
> versions than in previous versions. In 2.15 reading a 10000-row
> data.frame takes less than half a second:
>
>> (which.r <- R.Version()$version.string)
> [1] "R version 2.15.2 (2012-10-26)"
>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol
= 10))
>> dput(x, which.r)
>> system.time(y <- dget(which.r))
> user system elapsed
> 0.546 0.033 0.586
>
> While in 3.1.0 and r-devel it takes around 7 seconds.
>
>> (which.r <- R.Version()$version.string)
> [1] "R version 3.1.0 (2014-04-10)"
>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol
= 10))
>> dput(x, which.r)
>> system.time(y <- dget(which.r))
> user system elapsed
> 6.920 0.060 7.074
>
>> (which.r <- R.Version()$version.string)
> [1] "R Under development (unstable) (2014-06-19 r65979)"
>> x <- data.frame(matrix(sample(letters, 100000, replace = TRUE), ncol
= 10))
>> dput(x, which.r)
>> system.time(y <- dget(which.r))
> user system elapsed
> 6.886 0.047 6.943
>>
>
> I know dput/dget is probably not the right tool for this job:
> nevertheless the slowdown in quite dramatic so I thought it was worth
> calling attention to.
This is completely the wrong way to do this. See ?dump.
dget() basically calls eval(parse()). parse() is much slower in R >=
3.0 mainly because it keeps more information. Using keep.source=FALSE
here speeds things up a lot.
> system.time(y <- dget(which.r))
user system elapsed
3.233 0.012 3.248
> options(keep.source=FALSE)
> system.time(y <- dget(which.r))
user system elapsed
0.090 0.001 0.092
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595