Hilmar Berger
2017-May-09 07:47 UTC
[Rd] A few suggestions and perspectives from a PhD student
Hi, On 08/05/17 16:37, Ista Zahn wrote:> One of the key strengths of R is that packages are not akin to "fan > created mods". They are a central and necessary part of the R system. >I would tend to disagree here. R packages are in their majority not maintained by the core R developers. Concepts, features and lifetime depend mainly on the maintainers of the package (even though in theory GPL will allow to somebody to take over anytime). Several packages that are critical for processing big data and providing "modern" visualizations introduce concepts quite different from the legacy S/R language. I do feel that in a way, current core R shows strongly its origin in S, while modern concepts (e.g. data.table, dplyr, ggplot, ...) are often only available via extension packages. This is fine if one considers R to be a statistical toolkit; as a programming language, however, it introduces inconsistencies and uncertainties which could be avoided if some of the "modern" parts (including language concepts) could be more integrated in core-R. Best regards, Hilmar -- Dr. Hilmar Berger, MD Max Planck Institute for Infection Biology Charit?platz 1 D-10117 Berlin GERMANY Phone: + 49 30 28460 430 Fax: + 49 30 28460 401 E-Mail: berger at mpiib-berlin.mpg.de Web : www.mpiib-berlin.mpg.de
Joris Meys
2017-May-09 09:22 UTC
[Rd] A few suggestions and perspectives from a PhD student
On Tue, May 9, 2017 at 9:47 AM, Hilmar Berger <berger at mpiib-berlin.mpg.de> wrote:> Hi, > > On 08/05/17 16:37, Ista Zahn wrote: > >> One of the key strengths of R is that packages are not akin to "fan >> created mods". They are a central and necessary part of the R system. >> >> I would tend to disagree here. R packages are in their majority not > maintained by the core R developers. Concepts, features and lifetime depend > mainly on the maintainers of the package (even though in theory GPL will > allow to somebody to take over anytime). Several packages that are critical > for processing big data and providing "modern" visualizations introduce > concepts quite different from the legacy S/R language. I do feel that in a > way, current core R shows strongly its origin in S, while modern concepts > (e.g. data.table, dplyr, ggplot, ...) are often only available via > extension packages. This is fine if one considers R to be a statistical > toolkit; as a programming language, however, it introduces inconsistencies > and uncertainties which could be avoided if some of the "modern" parts > (including language concepts) could be more integrated in core-R. > > Best regards, > Hilmar >And I would tend to disagree here. R is build upon the paradigm of a functional programming language, and falls in the same group as clojure, haskell and the likes. It is a turing complete programming language on its own. That's quite a bit more than "a statistical toolkit". You can say that about eg the macro language of SPSS, but not about R. Second, there's little "modern" about the ideas behind the tidyverse. Piping is about as old as unix itself. The grammar of graphics, on which ggplot is based, stems from the SYStat graphics system from the nineties. Hadley and colleagues did (and do) a great job implementing these ideas in R, but the ideas do have a respectable age. Third, there's a lot of nonstandard evaluation going on in all these packages. Using them inside your own functions requires serious attention (eg the difference between aes() and aes_() in ggplot2). Actually, even though I definitely see the merits of these packages in data analysis, the tidyverse feels like a (clean and powerful) macro language on top of R. And that's good, but that doesn't mean these parts are essential to transform R into a programming language. Rather the contrary actually: too heavily relying on these packages does complicate things when you start to develop your own packages in R. Forth, the tidyverse masks quite some native R functions. Obviously they took great care in keeping the functionality as close as one would expect, but that's not always the case. The lag() function of dplyr() masks an S3 generic from the stats package for example. So if you work with time series in the stats package, loading the tidyverse gives you trouble. Fifth, many of the tidyverse packages are a version 0.x.y : they're still in beta development and their functionality might (and will) change. Functions disappear, arguments are called different, tags change,... Often the changes improve the packages, but they did break older code for me more than once. You can't expect the R core team to incorporate something that is bound to change. Last but not least, the tidyverse actually sometimes works against new R users. At least R users that go beyond the classic data workflow. I literally rewrote some code -from a consultant- that abused the _ply functions to create nested loops. Removing all that stuff and rewriting the code using a simple list in combination with a simple for-loop, sped up the code with a factor 150. That has nothing to do with dplyr, it's very fast. That has everything to do with that person having a hammer and thinking everything he sees is a nail. The tidyverse is no reason to not learn the concepts of the language it's built upon. The one thing I would like to see though, is the adaptation of the statistical toolkit so that it can work with data.table and tibble objects directly, as opposed to having to convert to a data.frame once you start building the models. And I believe that eventually there will be a replacement for the data.frame that increases R's performance and lessens its burden on the memory. So all in all, I do admire the tidyverse and how it speeds up data preparation for analysis. But tidyverse is a powerful data toolkit, not a programming language. And it won't make R a programming language either. Because R is already. Cheers Joris> > -- > Dr. Hilmar Berger, MD > Max Planck Institute for Infection Biology > Charit?platz 1 > D-10117 Berlin > GERMANY > > Phone: + 49 30 28460 430 > Fax: + 49 30 28460 401 > E-Mail: berger at mpiib-berlin.mpg.de > Web : www.mpiib-berlin.mpg.de > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Lionel Henry
2017-May-09 09:56 UTC
[Rd] A few suggestions and perspectives from a PhD student
> Third, there's a lot of nonstandard evaluation going on in all these > packages. Using them inside your own functions requires serious attention > (eg the difference between aes() and aes_() in ggplot2). Actually, even > though I definitely see the merits of these packages in data analysis, the > tidyverse feels like a (clean and powerful) macro language on top of R.That is going to change as we have put a lot of effort into learning how to deal with capturing functions. See the tidyeval framework which will enable full and flexible programmability of tidyverse grammars. That said I agree that data analysis and package programming often require different sets of tools. Lionel
Hilmar Berger
2017-May-09 10:31 UTC
[Rd] A few suggestions and perspectives from a PhD student
On 09/05/17 11:22, Joris Meys wrote:> > > On Tue, May 9, 2017 at 9:47 AM, Hilmar Berger > <berger at mpiib-berlin.mpg.de <mailto:berger at mpiib-berlin.mpg.de>> wrote: > > Hi, > > On 08/05/17 16:37, Ista Zahn wrote: > > One of the key strengths of R is that packages are not akin to > "fan > created mods". They are a central and necessary part of the R > system. > > I would tend to disagree here. R packages are in their majority > not maintained by the core R developers. Concepts, features and > lifetime depend mainly on the maintainers of the package (even > though in theory GPL will allow to somebody to take over anytime). > Several packages that are critical for processing big data and > providing "modern" visualizations introduce concepts quite > different from the legacy S/R language. I do feel that in a way, > current core R shows strongly its origin in S, while modern > concepts (e.g. data.table, dplyr, ggplot, ...) are often only > available via extension packages. This is fine if one considers R > to be a statistical toolkit; as a programming language, however, > it introduces inconsistencies and uncertainties which could be > avoided if some of the "modern" parts (including language > concepts) could be more integrated in core-R. > > Best regards, > Hilmar > > > And I would tend to disagree here. R is build upon the paradigm of a > functional programming language, and falls in the same group as > clojure, haskell and the likes. It is a turing complete programming > language on its own. That's quite a bit more than "a statistical > toolkit". You can say that about eg the macro language of SPSS, but > not about R. >My point was that inconsistencies are harder to tolerate when using R as a programming language as opposed to a toolkit that just has to do a job.> Second, there's little "modern" about the ideas behind the tidyverse. > Piping is about as old as unix itself. The grammar of graphics, on > which ggplot is based, stems from the SYStat graphics system from the > nineties. Hadley and colleagues did (and do) a great job implementing > these ideas in R, but the ideas do have a respectable age.Those ideas seem still to be more modern than e.g. stock R graphics designed probably in the seventies or eighties. Which still do their job for lots and lots of applications, however, the fact that many newer packages use ggplot in stead of plot() forces users to learn and use different paradigms for things so simple as drawing a line. I also would like to make clear that I do not advocate for including the whole tidyverse in core R. I just believe that having core concepts well supported in core R instead of implemented in a package might make things more consistent. E.g. method chaining ("%>%") is a core language feature in many languages.> > The one thing I would like to see though, is the adaptation of the > statistical toolkit so that it can work with data.table and tibble > objects directly, as opposed to having to convert to a data.frame once > you start building the models. And I believe that eventually there > will be a replacement for the data.frame that increases R's > performance and lessens its burden on the memory. >Which is a perfect example of what I mean: improved functionality should find their way into core R at some time point, replacing or extending outdated functionality. Otherwise, I don't know how hard it will be to develop 21st century methods on top of a 1980s/90s language core. Although I admit that the R developers are doing a great job to make it possible. Best, Hilmar> So all in all, I do admire the tidyverse and how it speeds up data > preparation for analysis. But tidyverse is a powerful data toolkit, > not a programming language. And it won't make R a programming language > either. Because R is already. > > Cheers > Joris > > > -- > Dr. Hilmar Berger, MD > Max Planck Institute for Infection Biology > Charit?platz 1 > D-10117 Berlin > GERMANY > > Phone: + 49 30 28460 430 <tel:%2B%2049%2030%2028460%20430> > Fax: + 49 30 28460 401 <tel:%2B%2049%2030%2028460%20401> > E-Mail: berger at mpiib-berlin.mpg.de > <mailto:berger at mpiib-berlin.mpg.de> > Web : www.mpiib-berlin.mpg.de <http://www.mpiib-berlin.mpg.de> > > > ______________________________________________ > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > <https://stat.ethz.ch/mailman/listinfo/r-devel> > > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 (0)9 264 61 79 > Joris.Meys at Ugent.be > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php-- Dr. Hilmar Berger, MD Max Planck Institute for Infection Biology Charit?platz 1 D-10117 Berlin GERMANY Phone: + 49 30 28460 430 Fax: + 49 30 28460 401 E-Mail: berger at mpiib-berlin.mpg.de Web : www.mpiib-berlin.mpg.de [[alternative HTML version deleted]]
Seemingly Similar Threads
- A few suggestions and perspectives from a PhD student
- Crash after (wrongly) applying product operator on object from LIMMA package
- '==' operator: inconsistency in data.frame(...) == NULL
- '==' operator: inconsistency in data.frame(...) == NULL
- Crash after (wrongly) applying product operator on object from LIMMA package