Dear all, I?m working as a data scientist in a major tech company. I have been using R for almost 20 years now and there?s one issue that?s been bugging me of late. I apologize in advance if this has been discussed before. R has traditionally been used for running short scripts or data analysis notebooks, but there?s recently been a growing interest in developing full applications in the language. Three examples come to mind: 1) The Shiny web application framework, which facilitates the developent of rich, interactive web applications 2) The httr package, which provides lower-level facilities than Shiny for writing web services 3) Batch jobs run by data scientists according to, say, a cron schedule Compared with other languages, R?s support for such applications is rather poor. The Rscript program is generally used to run an R script or an arbitrary R expression, but I feel it suffers from a few problems: 1) It encourages developers of batch jobs to provide their code in a single R file (bad for code structure and unit-testability) 2) It provides no way to deal with dependencies on other packages 3) It provides no way to "run" an application provided as an R package For example, let?s say I want to run a Shiny application that I provide as an R package (to keep the code modular, to benefit from unit tests, and to declare dependencies properly). I would then need to a) uncompress my R package, b) somehow, ensure my dependencies are installed, and c) call runApp(). This can get tedious, fast. Other languages let the developer package their code in "runnable" artefacts, and let the developer specify the main entry point. The mechanics depend on the language but are remarkably similar, and suggest a way to implement this in R. Through declarations in some file, the developer can often specify dependencies and declare where the program?s "main" function resides. Consider Java: Artefact: .jar file Declarations file: Manifest file Entry point: declared as 'Main-Class' Executed as: java -jar <jarfile> Or Python: Artefact: Python package, typically as .tar.gz source distribution file Declarations file: setup.py (which specifies dependencies) Entry point: special __main__() function Executed as: python -m <package> R has already much of this machinery: Artefact: R package Declarations file: DESCRIPTION Entry point: ? Executed as: ? I feel that R could benefit from letting the developer specify, possibly in DESCRIPTION, how to "run" the package. The package could then be run through, for example, a new R CMD command, for example: R CMD RUN <package> <args> I?m sure there are plenty of wrinkles in this idea that need to be ironed out, but is this something that has ever been considered, or that is on R?s roadmap? Thanks for reading so far, David Lindel?f, Ph.D. +41 (0)79 415 66 41 or skype:david.lindelof computersandbuildings.com Follow me on Twitter: twitter.com/dlindelof [[alternative HTML version deleted]]
Dear David, sharing some related (subjective) thoughts below. On Mon, Jan 7, 2019 at 9:53 PM David Lindelof <lindelof at ieee.org> wrote:> > Dear all, > > I?m working as a data scientist in a major tech company. I have been using > R for almost 20 years now and there?s one issue that?s been bugging me of > late. I apologize in advance if this has been discussed before. > > R has traditionally been used for running short scripts or data analysis > notebooks, but there?s recently been a growing interest in developing full > applications in the language. Three examples come to mind: > > 1) The Shiny web application framework, which facilitates the developent of > rich, interactive web applications > 2) The httr package, which provides lower-level facilities than Shiny for > writing web services > 3) Batch jobs run by data scientists according to, say, a cron schedule > > Compared with other languages, R?s support for such applications is rather > poor. The Rscript program is generally used to run an R script or an > arbitrary R expression, but I feel it suffers from a few problems: > > 1) It encourages developers of batch jobs to provide their code in a single > R file (bad for code structure and unit-testability)I think it rather encourages developers to create (internal) R packages and use those from the batch jobs. This way the structure is pretty clean, sharing code between scripts is easy, unit testing can be done within the package etc.> 2) It provides no way to deal with dependencies on other packagesSee above: create R package(s) and use those from the scripts.> 3) It provides no way to "run" an application provided as an R package > > For example, let?s say I want to run a Shiny application that I provide as > an R package (to keep the code modular, to benefit from unit tests, and to > declare dependencies properly). I would then need to a) uncompress my R > package, b) somehow, ensure my dependencies are installed, and c) call > runApp(). This can get tedious, fast.You can provide your app as a Docker image, so that the end-user simply calls a "docker pull" and then "docker run" -- that can be done from a user-friendly script as well. Of course, this requires Docker to be installed, but if that's a problem, probably better to "ship" the app as a web application and share a URL with the user, eg backed by shinyproxy.io> > Other languages let the developer package their code in "runnable" > artefacts, and let the developer specify the main entry point. The > mechanics depend on the language but are remarkably similar, and suggest a > way to implement this in R. Through declarations in some file, the > developer can often specify dependencies and declare where the program?s > "main" function resides. Consider Java: > > Artefact: .jar file > Declarations file: Manifest file > Entry point: declared as 'Main-Class' > Executed as: java -jar <jarfile> > > Or Python: > > Artefact: Python package, typically as .tar.gz source distribution file > Declarations file: setup.py (which specifies dependencies) > Entry point: special __main__() function > Executed as: python -m <package> > > R has already much of this machinery: > > Artefact: R package > Declarations file: DESCRIPTION > Entry point: ? > Executed as: ? > > I feel that R could benefit from letting the developer specify, possibly in > DESCRIPTION, how to "run" the package. The package could then be run > through, for example, a new R CMD command, for example: > > R CMD RUN <package> <args> > > I?m sure there are plenty of wrinkles in this idea that need to be ironed > out, but is this something that has ever been considered, or that is on R?s > roadmap? > > Thanks for reading so far, > > > > David Lindel?f, Ph.D. > +41 (0)79 415 66 41 or skype:david.lindelof > computersandbuildings.com > Follow me on Twitter: > twitter.com/dlindelof > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-devel
On 3 January 2019 at 11:43, David Lindelof wrote: | Dear all, | | I?m working as a data scientist in a major tech company. I have been using | R for almost 20 years now and there?s one issue that?s been bugging me of | late. I apologize in advance if this has been discussed before. | | R has traditionally been used for running short scripts or data analysis | notebooks, but there?s recently been a growing interest in developing full | applications in the language. Three examples come to mind: | | 1) The Shiny web application framework, which facilitates the developent of | rich, interactive web applications | 2) The httr package, which provides lower-level facilities than Shiny for | writing web services | 3) Batch jobs run by data scientists according to, say, a cron schedule That is a bit of a weird classification of "full applications". I have done this about as long as you but I also provided (at least as tests and demos) i) GUI apps using tcl/tk (which comes with R) and ii) GUI apps with Qt (or even Wt), see my RInside package. But my main weapon for 3) is littler. See cran.r-project.org/package=littler and particularly the many examples at github.com/eddelbuettel/littler/tree/master/inst/examples | Compared with other languages, R?s support for such applications is rather | poor. The Rscript program is generally used to run an R script or an | arbitrary R expression, but I feel it suffers from a few problems: | | 1) It encourages developers of batch jobs to provide their code in a single | R file (bad for code structure and unit-testability) | 2) It provides no way to deal with dependencies on other packages | 3) It provides no way to "run" an application provided as an R package Err, no. See the examples/ directory above. About every single one uses packages. As illustrations I have long-running and somewhat visible cronjobs that are implemented the same way: CRANberries (since 2007, now running hourly) and CRAN Policy Watch (running once a day). Because both are 'hacks' I never published the code but there is not that much to it. CRANberries just queries CRAN, compares to what it had last, and writes out variants of the DESCRIPTION file to text where a static blog engine (like Hugo, but older) makes a feed and html pagaes out of it. Oh, and we tweet because "why not?". | For example, let?s say I want to run a Shiny application that I provide as | an R package (to keep the code modular, to benefit from unit tests, and to | declare dependencies properly). I would then need to a) uncompress my R | package, b) somehow, ensure my dependencies are installed, and c) call | runApp(). This can get tedious, fast. Disagree here too. At work, I just write my code, organize it in packages, update the packages and have shiny expose whatever makes sense. | Other languages let the developer package their code in "runnable" | artefacts, and let the developer specify the main entry point. The | mechanics depend on the language but are remarkably similar, and suggest a | way to implement this in R. Through declarations in some file, the | developer can often specify dependencies and declare where the program?s | "main" function resides. Consider Java: | | Artefact: .jar file | Declarations file: Manifest file | Entry point: declared as 'Main-Class' | Executed as: java -jar <jarfile> | | Or Python: | | Artefact: Python package, typically as .tar.gz source distribution file | Declarations file: setup.py (which specifies dependencies) | Entry point: special __main__() function | Executed as: python -m <package> | | R has already much of this machinery: | | Artefact: R package | Declarations file: DESCRIPTION | Entry point: ? | Executed as: ? | | I feel that R could benefit from letting the developer specify, possibly in | DESCRIPTION, how to "run" the package. The package could then be run | through, for example, a new R CMD command, for example: | | R CMD RUN <package> <args> | | I?m sure there are plenty of wrinkles in this idea that need to be ironed | out, but is this something that has ever been considered, or that is on R?s | roadmap? Hm. If _you_ have an itch to scratch here why don't _you_ implement a draft. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Some other major tech companies have in the past widely use Runnable R Archives (".Rar" files), similar to Python .par files [1], and integrate them completely into the proprietary R package build system in use there. I thought there were a few systems like this that had made their way to CRAN or the UseR conferences, but I don't have a link. Building something specific to your organization on top of the python .par framework to archive up R, your needed packages/shared libraries, and other dependencies with a runner script to R CMD RUN your entry point in a sandbox is pretty straightforward way to have control in a way that makes sense for your environment. - Murray [1] google.github.io/subpar/subpar.html On Mon, Jan 7, 2019 at 12:53 PM David Lindelof <lindelof at ieee.org> wrote:> Dear all, > > I?m working as a data scientist in a major tech company. I have been using > R for almost 20 years now and there?s one issue that?s been bugging me of > late. I apologize in advance if this has been discussed before. > > R has traditionally been used for running short scripts or data analysis > notebooks, but there?s recently been a growing interest in developing full > applications in the language. Three examples come to mind: > > 1) The Shiny web application framework, which facilitates the developent of > rich, interactive web applications > 2) The httr package, which provides lower-level facilities than Shiny for > writing web services > 3) Batch jobs run by data scientists according to, say, a cron schedule > > Compared with other languages, R?s support for such applications is rather > poor. The Rscript program is generally used to run an R script or an > arbitrary R expression, but I feel it suffers from a few problems: > > 1) It encourages developers of batch jobs to provide their code in a single > R file (bad for code structure and unit-testability) > 2) It provides no way to deal with dependencies on other packages > 3) It provides no way to "run" an application provided as an R package > > For example, let?s say I want to run a Shiny application that I provide as > an R package (to keep the code modular, to benefit from unit tests, and to > declare dependencies properly). I would then need to a) uncompress my R > package, b) somehow, ensure my dependencies are installed, and c) call > runApp(). This can get tedious, fast. > > Other languages let the developer package their code in "runnable" > artefacts, and let the developer specify the main entry point. The > mechanics depend on the language but are remarkably similar, and suggest a > way to implement this in R. Through declarations in some file, the > developer can often specify dependencies and declare where the program?s > "main" function resides. Consider Java: > > Artefact: .jar file > Declarations file: Manifest file > Entry point: declared as 'Main-Class' > Executed as: java -jar <jarfile> > > Or Python: > > Artefact: Python package, typically as .tar.gz source distribution file > Declarations file: setup.py (which specifies dependencies) > Entry point: special __main__() function > Executed as: python -m <package> > > R has already much of this machinery: > > Artefact: R package > Declarations file: DESCRIPTION > Entry point: ? > Executed as: ? > > I feel that R could benefit from letting the developer specify, possibly in > DESCRIPTION, how to "run" the package. The package could then be run > through, for example, a new R CMD command, for example: > > R CMD RUN <package> <args> > > I?m sure there are plenty of wrinkles in this idea that need to be ironed > out, but is this something that has ever been considered, or that is on R?s > roadmap? > > Thanks for reading so far, > > > > David Lindel?f, Ph.D. > +41 (0)79 415 66 41 or skype:david.lindelof > computersandbuildings.com > Follow me on Twitter: > twitter.com/dlindelof > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
On 7 January 2019 at 22:09, Gergely Dar?czi wrote: | You can provide your app as a Docker image, so that the end-user | simply calls a "docker pull" and then "docker run" -- that can be done | from a user-friendly script as well. | Of course, this requires Docker to be installed, but if that's a | problem, probably better to "ship" the app as a web application and | share a URL with the user, eg backed by shinyproxy.io Excellent suggestion. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
On Mon, 7 Jan 2019 at 22:09, Gergely Dar?czi <daroczig at rapporter.net> wrote:> > Dear David, sharing some related (subjective) thoughts below. > > You can provide your app as a Docker image, so that the end-user > simply calls a "docker pull" and then "docker run" -- that can be done > from a user-friendly script as well. > Of course, this requires Docker to be installed, but if that's a > problem, probably better to "ship" the app as a web application and > share a URL with the user, eg backed by shinyproxy.ioIf Docker is a problem, you can also try podman: same usage, compatible with Dockerfiles and daemon-less, no admin rights required. podman.io I?aki
On Thu, Jan 31, 2019 at 3:14 PM David Lindelof <lindelof at ieee.org> wrote:> > In summary, I'm convinced R would benefit from something similar to Java's > `Main-Class` header or Python's `__main__()` function. A new R CMD command > would take a package, install its dependencies, and run its "main" > function.I just created and built a very boilerplate R package called "runme". I can install its dependencies and run its "main" function with: $ R CMD INSTALL runme_0.0.0.9000.tar.gz $ R -e 'runme::main()' No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with python and java and C the entrypoint is more tightly specified (__name__ ="__main__" in python, int main(..) in C and so on). But I don't think that's much of a problem. Does that not satisfy your requirements close enough? If you want it in one line then: R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()' will do the second if the first succeeds (Unix shells). You could write a script for $RHOME/bin/RUN which would be a two-liner and that could mandate the use of "main" as an entry point. But good luck getting anything into base R. Barry> If we have this machinery available, we could even consider > reaching out to Spark (and other tech stacks) developers and make it easier > to develop R applications for those platforms. > >[[alternative HTML version deleted]]
Would you care to share how your package installs its own dependencies? I assume this is done during the call to `main()`? (Last time I checked, R CMD INSTALL would not install a package's dependencies...) On Thu, Jan 31, 2019 at 4:38 PM Barry Rowlingson < b.rowlingson at lancaster.ac.uk> wrote:> > > On Thu, Jan 31, 2019 at 3:14 PM David Lindelof <lindelof at ieee.org> wrote: > >> >> In summary, I'm convinced R would benefit from something similar to Java's >> `Main-Class` header or Python's `__main__()` function. A new R CMD command >> would take a package, install its dependencies, and run its "main" >> function. > > > > I just created and built a very boilerplate R package called "runme". I > can install its dependencies and run its "main" function with: > > $ R CMD INSTALL runme_0.0.0.9000.tar.gz > $ R -e 'runme::main()' > > No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with > python and java and C the entrypoint is more tightly specified (__name__ => "__main__" in python, int main(..) in C and so on). But I don't think > that's much of a problem. > > Does that not satisfy your requirements close enough? If you want it in > one line then: > > R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()' > > will do the second if the first succeeds (Unix shells). > > You could write a script for $RHOME/bin/RUN which would be a two-liner and > that could mandate the use of "main" as an entry point. But good luck > getting anything into base R. > > Barry > > > > >> If we have this machinery available, we could even consider >> reaching out to Spark (and other tech stacks) developers and make it >> easier >> to develop R applications for those platforms. >> >> > >[[alternative HTML version deleted]]
Quoting: "In summary, I'm convinced R would benefit from something similar to Java's `Main-Class` header or Python's `__main__()` function. A new R CMD command would take a package, install its dependencies, and run its "main" function." This kind of increase the scope of your idea. New command in R CMD to redirect to "main" is interesting idea. On the other hand it will impose limitation on user comparing to the way how you could do it now: Rscript -e 'mypkg::mymain("myparam")' (or littler, it should be shipped with R IMO). For production system one doesn't want to just "install its dependencies". First dependencies has to be mirrored and their version frozen. Then testing your package on that set of dependencies. Once successfully done then same set of packages should be used for production deployment. For those processes you might find tools4pkgs branch in base R useful (packages.dcf, mirror.packages functions), unfortunately not merged: github.com/wch/r-source/compare/tools4pkgs Jan Gorecki On Thu, Jan 31, 2019 at 9:08 PM Barry Rowlingson <b.rowlingson at lancaster.ac.uk> wrote:> > On Thu, Jan 31, 2019 at 3:14 PM David Lindelof <lindelof at ieee.org> wrote: > > > > > In summary, I'm convinced R would benefit from something similar to Java's > > `Main-Class` header or Python's `__main__()` function. A new R CMD command > > would take a package, install its dependencies, and run its "main" > > function. > > > > I just created and built a very boilerplate R package called "runme". I can > install its dependencies and run its "main" function with: > > $ R CMD INSTALL runme_0.0.0.9000.tar.gz > $ R -e 'runme::main()' > > No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with > python and java and C the entrypoint is more tightly specified (__name__ => "__main__" in python, int main(..) in C and so on). But I don't think > that's much of a problem. > > Does that not satisfy your requirements close enough? If you want it in one > line then: > > R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()' > > will do the second if the first succeeds (Unix shells). > > You could write a script for $RHOME/bin/RUN which would be a two-liner and > that could mandate the use of "main" as an entry point. But good luck > getting anything into base R. > > Barry > > > > > > If we have this machinery available, we could even consider > > reaching out to Spark (and other tech stacks) developers and make it easier > > to develop R applications for those platforms. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-devel
@Barry I'm not sure your proposal would work, since `R CMD INSTALL` won't install a package's dependencies. Indeed it will fail with an error unless all the dependencies are met before calling it. Speaking of which, why doesn't R CMD INSTALL install a package's dependencies? Would it make sense to submit this as a desirable feature? Cheers, David On Thu, Jan 31, 2019 at 4:38 PM Barry Rowlingson < b.rowlingson at lancaster.ac.uk> wrote:> > > On Thu, Jan 31, 2019 at 3:14 PM David Lindelof <lindelof at ieee.org> wrote: > >> >> In summary, I'm convinced R would benefit from something similar to Java's >> `Main-Class` header or Python's `__main__()` function. A new R CMD command >> would take a package, install its dependencies, and run its "main" >> function. > > > > I just created and built a very boilerplate R package called "runme". I > can install its dependencies and run its "main" function with: > > $ R CMD INSTALL runme_0.0.0.9000.tar.gz > $ R -e 'runme::main()' > > No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with > python and java and C the entrypoint is more tightly specified (__name__ => "__main__" in python, int main(..) in C and so on). But I don't think > that's much of a problem. > > Does that not satisfy your requirements close enough? If you want it in > one line then: > > R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()' > > will do the second if the first succeeds (Unix shells). > > You could write a script for $RHOME/bin/RUN which would be a two-liner and > that could mandate the use of "main" as an entry point. But good luck > getting anything into base R. > > Barry > > > > >> If we have this machinery available, we could even consider >> reaching out to Spark (and other tech stacks) developers and make it >> easier >> to develop R applications for those platforms. >> >> > >[[alternative HTML version deleted]]
Ummm oops. Magic pixies? It assumed all of CRAN was installed? Maybe I'll write something that could go in /usr/lib/R/bin/RUN that checks and gets deps, installs the package, and runs package::main, which I think is what the OP wants - you could do R CMD RUN foo_1.0.0.tar.gz and away it goes... B On Thu, Jan 31, 2019 at 3:56 PM David Lindelof <lindelof at ieee.org> wrote:> > Would you care to share how your package installs its own dependencies? I assume this is done during the call to `main()`? (Last time I checked, R CMD INSTALL would not install a package's dependencies...) > > > On Thu, Jan 31, 2019 at 4:38 PM Barry Rowlingson <b.rowlingson at lancaster.ac.uk> wrote: >> >> >> >> On Thu, Jan 31, 2019 at 3:14 PM David Lindelof <lindelof at ieee.org> wrote: >>> >>> >>> In summary, I'm convinced R would benefit from something similar to Java's >>> `Main-Class` header or Python's `__main__()` function. A new R CMD command >>> would take a package, install its dependencies, and run its "main" >>> function. >> >> >> >> I just created and built a very boilerplate R package called "runme". I can install its dependencies and run its "main" function with: >> >> $ R CMD INSTALL runme_0.0.0.9000.tar.gz >> $ R -e 'runme::main()' >> >> No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with python and java and C the entrypoint is more tightly specified (__name__ == "__main__" in python, int main(..) in C and so on). But I don't think that's much of a problem. >> >> Does that not satisfy your requirements close enough? If you want it in one line then: >> >> R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()' >> >> will do the second if the first succeeds (Unix shells). >> >> You could write a script for $RHOME/bin/RUN which would be a two-liner and that could mandate the use of "main" as an entry point. But good luck getting anything into base R. >> >> Barry >> >> >> >>> >>> If we have this machinery available, we could even consider >>> reaching out to Spark (and other tech stacks) developers and make it easier >>> to develop R applications for those platforms. >>> >> >>