thr3ads.net - R devel - [Rd] R-devel Digest, Vol 133, Issue 23 [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Radford Neal

2014-Mar-26 17:24 UTC

[Rd] R-devel Digest, Vol 133, Issue 23

> From: Richard Cotton <richierocks at gmail.com>
> 
> The rep function is very versatile, but that versatility comes at a
> cost: it takes a bit of effort to learn (and remember) its syntax.
> This is a problem, since rep is one of the first functions many
> beginners will come across.  Of the three main uses of rep, two have
> simpler alternatives.
> 
> rep(x, times = ) has rep.int
> rep(x, length.out  = ) has rep_len
> 
> I think that a rep_each function would be a worthy addition for the
> third use case
> 
> rep(x, each = )
> 
> (It might also be worth having rep_times as a synonym for rep.int.)
I think this is exactly the wrong approach.  Indeed, the aim should be
to get rid of functions like rep.int (or at least discourage their
use, even if they have to be kept for compatibility).

Why is rep_each(x,n) better than rep(x,each=n)?  There is no saving in
typing (which would be trivial anyway).  There *ought* to be no
significant difference in speed (though that seems to have been the
motive for rep.int).  Are you trying to let students learn R without
ever learning about specifying arguments by name?

And where would you stop?  How about seq_by(a,b,s) rather than having
to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
glm_poisson, etc. so we don't have to remember the "family"
argument?
This way lies madness...

   Radford Neal

peter dalgaard

2014-Mar-26 22:00 UTC

head link

[Rd] R-devel Digest, Vol 133, Issue 23

On 26 Mar 2014, at 18:24 , Radford Neal <radford at cs.toronto.edu> wrote:
>> From: Richard Cotton <richierocks at gmail.com>
>> 
>> The rep function is very versatile, but that versatility comes at a
>> cost: it takes a bit of effort to learn (and remember) its syntax.
>> This is a problem, since rep is one of the first functions many
>> beginners will come across.  Of the three main uses of rep, two have
>> simpler alternatives.
>> 
>> rep(x, times = ) has rep.int
>> rep(x, length.out  = ) has rep_len
>> 
>> I think that a rep_each function would be a worthy addition for the
>> third use case
>> 
>> rep(x, each = )
>> 
>> (It might also be worth having rep_times as a synonym for rep.int.)
> 
> I think this is exactly the wrong approach.  Indeed, the aim should be
> to get rid of functions like rep.int (or at least discourage their
> use, even if they have to be kept for compatibility).
> 
> Why is rep_each(x,n) better than rep(x,each=n)?  There is no saving in
> typing (which would be trivial anyway).  There *ought* to be no
> significant difference in speed (though that seems to have been the
> motive for rep.int).  Are you trying to let students learn R without
> ever learning about specifying arguments by name?
> 
> And where would you stop?  How about seq_by(a,b,s) rather than having
> to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
> glm_poisson, etc. so we don't have to remember the "family"
argument?
> This way lies madness...
Spot on. 

Well, maybe a slight disagreement: In a weakly typed language like R, you will
always have performance losses due to type testing and dispatching, and no
compiler/interpreter is intelligent enough to predict the types so that this can
be avoided. Some amout of hinting is needed for reliable speedups, either by
having special functions for simple cases (allowed to make assumptions on their
inputs), or some sort of #pragma-like construction.

Actually, rep.int seems to be a poor example of this since the speedup is pretty
negligible unless you do huge amounts of short replicates. I expect that the
S-PLUS compatibility was the main reason to have it. Case in point:
> system.time(for(i in 1:10000000) rep("a",10))   user  system elapsed 
 16.721   0.125  19.037 > system.time(for(i in 1:10000000) rep.int("a",10))   user  system elapsed 
 14.356   0.050  14.611 > system.time(for(i in 1:1000000) rep("a",1000))   user  system elapsed 
 11.655   2.157  14.263 > system.time(for(i in 1:1000000) rep.int("a",1000))   user  system elapsed 
 10.957   1.708  12.917 

For more spectacular speedups compare seq(1,10) to seq_len(10) or even just to
1:10. Then again, the slowdown in seq() is so large that it is hard to believe
it to be completely unavoidable.
  

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Hervé Pagès

2014-Mar-26 23:43 UTC

head link

[Rd] R-devel Digest, Vol 133, Issue 23

Hi,

On 03/26/2014 10:24 AM, Radford Neal wrote:>> From: Richard Cotton <richierocks at gmail.com>
>>
>> The rep function is very versatile, but that versatility comes at a
>> cost: it takes a bit of effort to learn (and remember) its syntax.
>> This is a problem, since rep is one of the first functions many
>> beginners will come across.  Of the three main uses of rep, two have
>> simpler alternatives.
>>
>> rep(x, times = ) has rep.int
>> rep(x, length.out  = ) has rep_len
>>
>> I think that a rep_each function would be a worthy addition for the
>> third use case
>>
>> rep(x, each = )
>>
>> (It might also be worth having rep_times as a synonym for rep.int.)
>
> I think this is exactly the wrong approach.  Indeed, the aim should be
> to get rid of functions like rep.int (or at least discourage their
> use, even if they have to be kept for compatibility).
>
> Why is rep_each(x,n) better than rep(x,each=n)?
According to the NEWS file, it seems that R core felt that having
rep_len() was a good idea.

   There is a new function rep_len() analogous to rep.int() for when
   speed is required (and names are not).

Now one might wonder (and your students might wonder too) why having
rep_each() "for when speed is required (and names are not)" is not a
good idea.

By having rep_len(), rep_each(), and rep_times(), the 3 extra arguments
in rep(x, ...) would be covered. Plus, when I use tab completion after
typing rep_, I would get a nice summary and would be able to quickly
choose. Right now, when I do this, one function is missing, and one has
a misleading name. So I'd rather have no specialized function at all,
or have the 3. Would be cleaner and less confusing than the current
situation.

Cheers,
H.

> There is no saving in
> typing (which would be trivial anyway).  There *ought* to be no
> significant difference in speed (though that seems to have been the
> motive for rep.int).  Are you trying to let students learn R without
> ever learning about specifying arguments by name?
>
> And where would you stop?  How about seq_by(a,b,s) rather than having
> to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
> glm_poisson, etc. so we don't have to remember the "family"
argument?
> This way lies madness...
>
>     Radford Neal
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Maybe Matching Threads

Search for more apparently analagous threads

R devel - Mar 2014 - R-devel Digest, Vol 133, Issue 23

[Rd] R-devel Digest, Vol 133, Issue 23

[Rd] R-devel Digest, Vol 133, Issue 23

[Rd] R-devel Digest, Vol 133, Issue 23

Maybe Matching Threads