thr3ads.net - R devel - [Rd] A few suggestions and perspectives from a PhD student [May 2017]

If this information is useful, please help other people find it:
Share via:

Hilmar Berger

2017-May-09 07:47 UTC

[Rd] A few suggestions and perspectives from a PhD student

Hi,

On 08/05/17 16:37, Ista Zahn wrote:> One of the key strengths of R is that packages are not akin to "fan
> created mods". They are a central and necessary part of the R system.
>I would tend to disagree here. R packages are in their majority not 
maintained by the core R developers. Concepts, features and lifetime 
depend mainly on the maintainers of the package (even though in theory 
GPL will allow to somebody to take over anytime). Several packages that 
are critical for processing big data and providing "modern" 
visualizations introduce concepts quite different from the legacy S/R 
language. I do feel that in a way, current core R shows strongly its 
origin in S, while modern concepts (e.g. data.table, dplyr, ggplot, ...) 
are often only available via extension packages. This is fine if one 
considers R to be a statistical toolkit; as a programming language, 
however, it introduces inconsistencies and uncertainties which could be 
avoided if some of the "modern" parts (including language concepts) 
could be more integrated in core-R.

Best regards,
Hilmar

-- 
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charit?platz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:    + 49 30 28460 401
  
E-Mail: berger at mpiib-berlin.mpg.de
Web   : www.mpiib-berlin.mpg.de

Joris Meys

2017-May-09 09:22 UTC

head link

[Rd] A few suggestions and perspectives from a PhD student

On Tue, May 9, 2017 at 9:47 AM, Hilmar Berger <berger at
mpiib-berlin.mpg.de>
wrote:
> Hi,
>
> On 08/05/17 16:37, Ista Zahn wrote:
>
>> One of the key strengths of R is that packages are not akin to
"fan
>> created mods". They are a central and necessary part of the R
system.
>>
>> I would tend to disagree here. R packages are in their majority not
> maintained by the core R developers. Concepts, features and lifetime depend
> mainly on the maintainers of the package (even though in theory GPL will
> allow to somebody to take over anytime). Several packages that are critical
> for processing big data and providing "modern" visualizations
introduce
> concepts quite different from the legacy S/R language. I do feel that in a
> way, current core R shows strongly its origin in S, while modern concepts
> (e.g. data.table, dplyr, ggplot, ...) are often only available via
> extension packages. This is fine if one considers R to be a statistical
> toolkit; as a programming language, however, it introduces inconsistencies
> and uncertainties which could be avoided if some of the "modern"
parts
> (including language concepts) could be more integrated in core-R.
>
> Best regards,
> Hilmar
>
And I would tend to disagree here. R is build upon the paradigm of a
functional programming language, and falls in the same group as clojure,
haskell and the likes. It is a turing complete programming language on its
own. That's quite a bit more than "a statistical toolkit". You can
say that
about eg the macro language of SPSS, but not about R.

Second, there's little "modern" about the ideas behind the
tidyverse.
Piping is about as old as unix itself. The grammar of graphics, on which
ggplot is based, stems from the SYStat graphics system from the nineties.
Hadley and colleagues did (and do) a great job implementing these ideas in
R, but the ideas do have a respectable age.

Third, there's a lot of nonstandard evaluation going on in all these
packages. Using them inside your own functions requires serious attention
(eg the difference between aes() and aes_() in ggplot2). Actually, even
though I definitely see the merits of these packages in data analysis, the
tidyverse feels like a (clean and powerful) macro language on top of R. And
that's good, but that doesn't mean these parts are essential to
transform R
into a programming language. Rather the contrary actually: too heavily
relying on these packages does complicate things when you start to develop
your own packages in R.

Forth, the tidyverse masks quite some native R functions. Obviously they
took great care in keeping the functionality as close as one would expect,
but that's not always the case. The lag() function of dplyr() masks an S3
generic from the stats package for example. So if you work with time series
in the stats package, loading the tidyverse gives you trouble.

Fifth, many of the tidyverse packages are a version 0.x.y : they're still
in beta development and their functionality might (and will) change.
Functions disappear, arguments are called different, tags change,... Often
the changes improve the packages, but they did break older code for me more
than once. You can't expect the R core team to incorporate something that
is bound to change.

Last but not least, the tidyverse actually sometimes works against new R
users. At least R users that go beyond the classic data workflow. I
literally rewrote some code -from a consultant- that abused the _ply
functions to create nested loops. Removing all that stuff and rewriting the
code using a simple list in combination with a simple for-loop, sped up the
code with a factor 150. That has nothing to do with dplyr, it's very fast.
That has everything to do with that person having a hammer and thinking
everything he sees is a nail. The tidyverse is no reason to not learn the
concepts of the language it's built upon.

The one thing I would like to see though, is the adaptation of the
statistical toolkit so that it can work with data.table and tibble objects
directly, as opposed to having to convert to a data.frame once you start
building the models. And I believe that eventually there will be a
replacement for the data.frame that increases R's performance and lessens
its burden on the memory.

So all in all, I do admire the tidyverse and how it speeds up data
preparation for analysis. But tidyverse is a powerful data toolkit, not a
programming language. And it won't make R a programming language either.
Because R is already.

Cheers
Joris
>
> --
> Dr. Hilmar Berger, MD
> Max Planck Institute for Infection Biology
> Charit?platz 1
> D-10117 Berlin
> GERMANY
>
> Phone:  + 49 30 28460 430
> Fax:    + 49 30 28460 401
>  E-Mail: berger at mpiib-berlin.mpg.de
> Web   : www.mpiib-berlin.mpg.de
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

Lionel Henry

2017-May-09 09:56 UTC

head link

[Rd] A few suggestions and perspectives from a PhD student

> Third, there's a lot of nonstandard evaluation going on in all these
> packages. Using them inside your own functions requires serious attention
> (eg the difference between aes() and aes_() in ggplot2). Actually, even
> though I definitely see the merits of these packages in data analysis, the
> tidyverse feels like a (clean and powerful) macro language on top of R.
That is going to change as we have put a lot of effort into learning
how to deal with capturing functions. See the tidyeval framework which
will enable full and flexible programmability of tidyverse grammars.

That said I agree that data analysis and package programming often
require different sets of tools.

Lionel

Hilmar Berger

2017-May-09 10:31 UTC

head link

[Rd] A few suggestions and perspectives from a PhD student

On 09/05/17 11:22, Joris Meys wrote:>
>
> On Tue, May 9, 2017 at 9:47 AM, Hilmar Berger 
> <berger at mpiib-berlin.mpg.de <mailto:berger at
mpiib-berlin.mpg.de>> wrote:
>
>     Hi,
>
>     On 08/05/17 16:37, Ista Zahn wrote:
>
>         One of the key strengths of R is that packages are not akin to
>         "fan
>         created mods". They are a central and necessary part of the R
>         system.
>
>     I would tend to disagree here. R packages are in their majority
>     not maintained by the core R developers. Concepts, features and
>     lifetime depend mainly on the maintainers of the package (even
>     though in theory GPL will allow to somebody to take over anytime).
>     Several packages that are critical for processing big data and
>     providing "modern" visualizations introduce concepts quite
>     different from the legacy S/R language. I do feel that in a way,
>     current core R shows strongly its origin in S, while modern
>     concepts (e.g. data.table, dplyr, ggplot, ...) are often only
>     available via extension packages. This is fine if one considers R
>     to be a statistical toolkit; as a programming language, however,
>     it introduces inconsistencies and uncertainties which could be
>     avoided if some of the "modern" parts (including language
>     concepts) could be more integrated in core-R.
>
>     Best regards,
>     Hilmar
>
>
> And I would tend to disagree here. R is build upon the paradigm of a 
> functional programming language, and falls in the same group as 
> clojure, haskell and the likes. It is a turing complete programming 
> language on its own. That's quite a bit more than "a statistical 
> toolkit". You can say that about eg the macro language of SPSS, but 
> not about R.
>My point was that inconsistencies are harder to tolerate when using R as 
a programming language as opposed to a toolkit that just has to do a
job.> Second, there's little "modern" about the ideas behind the
tidyverse.
> Piping is about as old as unix itself. The grammar of graphics, on 
> which ggplot is based, stems from the SYStat graphics system from the 
> nineties. Hadley and colleagues did (and do) a great job implementing 
> these ideas in R, but the ideas do have a respectable age.Those ideas seem still to be more modern than e.g. stock R graphics 
designed probably in the seventies or eighties. Which still do their job 
for lots and lots of applications, however, the fact that many newer 
packages use ggplot in stead of plot() forces users to learn and use 
different paradigms for things so simple as drawing a line.

I also would like to make clear that I do not advocate for including the 
whole tidyverse in core R. I just believe that having core concepts well 
supported in core R instead of implemented in a package might make 
things more consistent. E.g. method chaining ("%>%") is a core
language
feature in many languages.>
> The one thing I would like to see though, is the adaptation of the 
> statistical toolkit so that it can work with data.table and tibble 
> objects directly, as opposed to having to convert to a data.frame once 
> you start building the models. And I believe that eventually there 
> will be a replacement for the data.frame that increases R's 
> performance and lessens its burden on the memory.
>Which is a perfect example of what I mean: improved functionality should 
find their way into core R at some time point, replacing or extending 
outdated functionality. Otherwise, I don't know how hard it will be to 
develop 21st century methods on top of a 1980s/90s language core. 
Although I admit that the R developers are doing a great job to make it 
possible.

Best,
Hilmar
> So all in all, I do admire the tidyverse and how it speeds up data 
> preparation for analysis. But tidyverse is a powerful data toolkit, 
> not a programming language. And it won't make R a programming language 
> either. Because R is already.
>
> Cheers
> Joris
>
>
>     -- 
>     Dr. Hilmar Berger, MD
>     Max Planck Institute for Infection Biology
>     Charit?platz 1
>     D-10117 Berlin
>     GERMANY
>
>     Phone: + 49 30 28460 430 <tel:%2B%2049%2030%2028460%20430>
>     Fax: + 49 30 28460 401 <tel:%2B%2049%2030%2028460%20401>
>      E-Mail: berger at mpiib-berlin.mpg.de
>     <mailto:berger at mpiib-berlin.mpg.de>
>     Web   : www.mpiib-berlin.mpg.de <http://www.mpiib-berlin.mpg.de>
>
>
>     ______________________________________________
>     R-devel at r-project.org <mailto:R-devel at r-project.org>
mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>
> -- 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel :  +32 (0)9 264 61 79
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
-- 
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charit?platz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:    + 49 30 28460 401
  
E-Mail: berger at mpiib-berlin.mpg.de
Web   : www.mpiib-berlin.mpg.de


	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more reasonably related threads

R devel - May 2017 - A few suggestions and perspectives from a PhD student

[Rd] A few suggestions and perspectives from a PhD student

[Rd] A few suggestions and perspectives from a PhD student

[Rd] A few suggestions and perspectives from a PhD student

[Rd] A few suggestions and perspectives from a PhD student

Reasonably Related Threads