thr3ads.net - R devel - [Rd] proposal for adapting code of function gl() [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Joris Meys

2011-Apr-11 21:53 UTC

[Rd] proposal for adapting code of function gl()

Based on a discussion on SO I ran some tests and found that converting
to a factor is best done early in the process. Hence, I propose to
rewrite the gl() function as :

gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){
  rep(
      rep(
        factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n)
      ),length.out=length
  )
}

Some test results  :
> system.time(X1 <- gl(5,1e7))   user  system elapsed
  29.21    0.30   29.58
> system.time(X2 <- gl2(5,1e7))   user  system elapsed
   1.87    0.45    2.37
> all.equal(X1,X2)[1] TRUE
> system.time(X1 <- gl(5,100,1e7))   user  system elapsed
   5.98    0.05    6.05
> system.time(X2 <- gl2(5,100,1e7))   user  system elapsed
   0.21    0.03    0.25
> all.equal(X1,X2)[1] TRUE
> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5]))   user  system elapsed
   5.88    0.02    5.98
> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5]))   user  system elapsed
   0.20    0.05    0.25
> all.equal(X1,X2)[1] TRUE
> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T))   user  system elapsed
   5.82    0.03    5.89
> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T))   user  system elapsed
   0.22    0.04    0.25
> all.equal(X1,X2)[1] TRUE

reference to SO :
http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

peter dalgaard

2011-Apr-12 06:51 UTC

head link

[Rd] proposal for adapting code of function gl()

On Apr 11, 2011, at 23:53 , Joris Meys wrote:
> Based on a discussion on SO I ran some tests and found that converting
> to a factor is best done early in the process. Hence, I propose to
> rewrite the gl() function as :
> 
> gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){
>  rep(
>      rep(
>        factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n)
>      ),length.out=length
>  )
> }
> 
That's bizarre! You are relying on an optimization in rep.factor whereby it
replicates the internal codes and exploits that the result has the same
structure as the input. I.e., it just tacks on class and levels attributes
rather than call match() as factor() does internally.

However, you can do the same thing straight away: 
> gl2function (n, k, length = n * k, labels = 1:n, ordered = FALSE) 
{
   y <- rep(rep.int(1:n, rep.int(k, n)), length.out = length) 
   structure(y, levels=as.character(labels),
class=c(if(ordered)"ordered","factor"))
}

I get this to be a bit faster than your version, although with a smaller speedup
factor, which probably just indicates that match() is faster on this machine.
> Some test results  :
> 
>> system.time(X1 <- gl(5,1e7))
>   user  system elapsed
>  29.21    0.30   29.58
> 
>> system.time(X2 <- gl2(5,1e7))
>   user  system elapsed
>   1.87    0.45    2.37
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
>> system.time(X1 <- gl(5,100,1e7))
>   user  system elapsed
>   5.98    0.05    6.05
> 
>> system.time(X2 <- gl2(5,100,1e7))
>   user  system elapsed
>   0.21    0.03    0.25
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5]))
>   user  system elapsed
>   5.88    0.02    5.98
> 
>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5]))
>   user  system elapsed
>   0.20    0.05    0.25
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
>> system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T))
>   user  system elapsed
>   5.82    0.03    5.89
> 
>> system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T))
>   user  system elapsed
>   0.22    0.04    0.25
> 
>> all.equal(X1,X2)
> [1] TRUE
> 
> reference to SO :
>
http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels
> 
> -- 
> Joris Meys
> Statistical consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
> 
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Apparently Analagous Threads

Search for more possibly parallel threads

R devel - Apr 2011 - proposal for adapting code of function gl()

[Rd] proposal for adapting code of function gl()

[Rd] proposal for adapting code of function gl()

Apparently Analagous Threads