thr3ads.net - R devel - [Rd] Conventions: Use of globals and main functions [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Cyclic Group Z_1

2019-Aug-28 03:56 UTC

[Rd] Conventions: Use of globals and main functions

>?That beeing said I think the main task of scripts is to get things done via
running them end to end in a fresh session. Now, it very well may happen that a
lot of stuff has to be done. Than splitting up scripts into subscripts and
sourcing them from a meta script is a straightforward solution. It might also be
that some functionality is put into functions to be reused in other places. This
can be done by putting those function definitions into separate files. Than one
cane use source wherever those functions?are needed. Now, putting stuff that
runs code and scripts that define/provovide functions into the same script is a
bad idea. Using the main()-idioms described might prevent this the problems
stemming from mixing functions and function execution. But it would also
encourage this mixing which is - I think, a bad idea anyways.
I actually would agree entirely that files should not serve as both source files
for re-used functions as well as application code. The suggestion for a main()
idiom is merely to reduce variable scope and bring R practices more in line with
generally recommended programming practices, not so that they can act as
packages/modules/libraries. When I compared R scripts containing main functions
to packages, I only mean in the sense that they help manage scope (the latter
through package namespaces). Any other named functions besides main would be
functions specifically tied to the script.?

I do see your point, though, that this could result in bad practice, namely the
usage mixing you described.?

Best,
CG

Peter Meissner

2019-Aug-28 07:24 UTC

head link

[Rd] Conventions: Use of globals and main functions

Firtst, I think that thinking about best practice advise and beeing able to
accomandate different usage scenarios is a good thing despite me arguing
against introducing the main()-idiom.

Let's have another turn on the global-environment is bad argument.

It has two parts:

(1) Glattering namespace. Glattering name space might become a problem
because you might end up having used all reasonable words already so one
has to extend the space for names with new namespaces. For scripting, this
usually should be no problem since one can always create more space through
the usage of environments - put code into a function, put objects into
environments, or write a package. Glattering name space might also become a
problem if things get complex. If your code base gets larger on name might
be overwritten by the other on accident. This is a problem that can be
solved by not simply extending the name space (more space) but by
structuring it - keeping related things together, hiding unused helpers
e.g. by putting them in a function, or an environment, or writing a
package.

Now, if we put everything into main() we have not solved much. Now instead
of 100 objects glattering the global environment we have e.g. 5 obejcts in
the global environment and 95 objects in the main()-function environment.

(2) Changing global state. A thing that is a little bit related to the
global environment is the idea of global state and the problems that arise
when changing global state. But the global environment in R is not the same
as a global state. First, all normal stuff in R (except environments, R6
objects, data.tables) are passed by copy (never mind how its implented
under the hood). So when I assign a value to a new name, this will behave
like if I made a copy - thus I simply do not care what happens to the value
of the original because my copy's value is independent. Next, it is
possible to misuse the global environment (or nay parent environment) as
global state via either explicitly using assign(..., ..., env globalenv()) or by
using the <<- operator. Also, one has access to objects
of enclosing env?ronment when e.g. executing code in a function environment
but this is read only by default. Although this is possible and it is done
from time to time, this is not how things are done 99% of the time. The
common practice - and I would say best practice also - is to use pure
function that only depend on their inputs and do not change anything except
returing a value. Using pure functions mainly prevents 99% of the problems
with global state while using more name spaces does only chop these kind of
problems into smaller and thus more numerous problems.


Best, Peter

Am Mi., 28. Aug. 2019 um 05:56 Uhr schrieb Cyclic Group Z_1 <
cyclicgroup-z1 at yahoo.com>:
> > That beeing said I think the main task of scripts is to get things
done
> via running them end to end in a fresh session. Now, it very well may
> happen that a lot of stuff has to be done. Than splitting up scripts into
> subscripts and sourcing them from a meta script is a straightforward
> solution. It might also be that some functionality is put into functions to
> be reused in other places. This can be done by putting those function
> definitions into separate files. Than one cane use source wherever those
> functions are needed. Now, putting stuff that runs code and scripts that
> define/provovide functions into the same script is a bad idea. Using the
> main()-idioms described might prevent this the problems stemming from
> mixing functions and function execution. But it would also encourage this
> mixing which is - I think, a bad idea anyways.
>
> I actually would agree entirely that files should not serve as both source
> files for re-used functions as well as application code. The suggestion for
> a main() idiom is merely to reduce variable scope and bring R practices
> more in line with generally recommended programming practices, not so that
> they can act as packages/modules/libraries. When I compared R scripts
> containing main functions to packages, I only mean in the sense that they
> help manage scope (the latter through package namespaces). Any other named
> functions besides main would be functions specifically tied to the script.
>
> I do see your point, though, that this could result in bad practice,
> namely the usage mixing you described.
>
> Best,
> CG
>
	[[alternative HTML version deleted]]

Cyclic Group Z_1

2019-Aug-28 15:58 UTC

head link

[Rd] Conventions: Use of globals and main functions

I appreciate the well-thought-out comments.

To your first point, I am not sure what "glattering" means precisely
(a Google search revealed nothing useful), but I assume it means something to
the effect of overfilling the main namespace with too many names. Per Norm
Matloff's counterpoint in The Art of R Programming regarding this issue,
this is mostly avoided by well-defined, (sufficiently) long names. Also, when a
program is properly modularized, one generally wouldn't have this many
objects at the same time unless the complexity of a program demands it. You can,
for example, use named function scope outside main or anonymous functions to
limit variable scope to operations that need a given variable. Using main() with
any named functions closely tied to a script defined outside it actually
addresses this "glattering namespace" issue, since, if we treat the
global scope as a main function instead of using a main() idiom, any functions
that are defined in global scope will contain all global variables within its
search path. Alternatively, one can put all named functions in a package; in
some cases, however, it will make more sense to keep a function defined within
the script. Unless you never modularize your code into functions and flatten
everything out into a common namespace, using main would be helpful to avoid
namespace-glattering. Maybe I'm missing something, but I'm not sure how
namespace-glattering favors not using a main() idiom, since avoiding globals
doesn't mean not structuring your code properly; it actually seems to favor
using main(). Given any properly structured program (organizing functions as
needed), the implementation that puts all variables into the global workspace
(same as the top-level functions) will be less safe since all functions will
contain all globals within its search path. (Unless, of course, every single
function is put into a package).

To your second point, I agree that many of the issues associated with global
state/environment are generally less problematic when using pure (or as pure as
possible) functions. On a related note, lexically scoped functional languages
(especially pure functional ones) generally encourage modularizing everything
into functions, rather than having a lot of objects exposed to the top level
(not to say that globals are not used, only that they are not the default
choice). So the typical R way of doing this tends to disagree with how things
are normally done in functional programming. Chopping our code into
well-abstracted functions (and therefore namespaces) is the functional way to do
things and helps to minimize the state to which any particular function has
access. Organizing the functions we want to be pure so that they are not defined
in the same environment in which they are called actually helps to ensure
function purity in the input direction, since those functions will not have
lexical-scope access to called variables. (That is, you may have written an
impure function without realizing it; organizing functions so they are not
defined in the same environment as when they are called helps to ensure purity.)

Perhaps I am mistaken, but in either case, your points actually favor a main()
idiom, unless you take using main() to mean using main() with extra bits (e.g.,
flattening your code structure).

Admittedly, putting every single function into a package and not having any
named functions in your script generally addresses all of these issues.?

Best,
CG

Apparently Analagous Threads

Search for more seemingly similar threads

R devel - Aug 2019 - Conventions: Use of globals and main functions

[Rd] Conventions: Use of globals and main functions

[Rd] Conventions: Use of globals and main functions

[Rd] Conventions: Use of globals and main functions

Apparently Analagous Threads