thr3ads.net - R devel - [Rd] Conventions: Use of globals and main functions [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Cyclic Group Z_1

2019-Aug-28 15:58 UTC

[Rd] Conventions: Use of globals and main functions

I appreciate the well-thought-out comments.

To your first point, I am not sure what "glattering" means precisely
(a Google search revealed nothing useful), but I assume it means something to
the effect of overfilling the main namespace with too many names. Per Norm
Matloff's counterpoint in The Art of R Programming regarding this issue,
this is mostly avoided by well-defined, (sufficiently) long names. Also, when a
program is properly modularized, one generally wouldn't have this many
objects at the same time unless the complexity of a program demands it. You can,
for example, use named function scope outside main or anonymous functions to
limit variable scope to operations that need a given variable. Using main() with
any named functions closely tied to a script defined outside it actually
addresses this "glattering namespace" issue, since, if we treat the
global scope as a main function instead of using a main() idiom, any functions
that are defined in global scope will contain all global variables within its
search path. Alternatively, one can put all named functions in a package; in
some cases, however, it will make more sense to keep a function defined within
the script. Unless you never modularize your code into functions and flatten
everything out into a common namespace, using main would be helpful to avoid
namespace-glattering. Maybe I'm missing something, but I'm not sure how
namespace-glattering favors not using a main() idiom, since avoiding globals
doesn't mean not structuring your code properly; it actually seems to favor
using main(). Given any properly structured program (organizing functions as
needed), the implementation that puts all variables into the global workspace
(same as the top-level functions) will be less safe since all functions will
contain all globals within its search path. (Unless, of course, every single
function is put into a package).

To your second point, I agree that many of the issues associated with global
state/environment are generally less problematic when using pure (or as pure as
possible) functions. On a related note, lexically scoped functional languages
(especially pure functional ones) generally encourage modularizing everything
into functions, rather than having a lot of objects exposed to the top level
(not to say that globals are not used, only that they are not the default
choice). So the typical R way of doing this tends to disagree with how things
are normally done in functional programming. Chopping our code into
well-abstracted functions (and therefore namespaces) is the functional way to do
things and helps to minimize the state to which any particular function has
access. Organizing the functions we want to be pure so that they are not defined
in the same environment in which they are called actually helps to ensure
function purity in the input direction, since those functions will not have
lexical-scope access to called variables. (That is, you may have written an
impure function without realizing it; organizing functions so they are not
defined in the same environment as when they are called helps to ensure purity.)

Perhaps I am mistaken, but in either case, your points actually favor a main()
idiom, unless you take using main() to mean using main() with extra bits (e.g.,
flattening your code structure).

Admittedly, putting every single function into a package and not having any
named functions in your script generally addresses all of these issues.?

Best,
CG

Peter Meissner

2019-Aug-28 16:21 UTC

head link

[Rd] Conventions: Use of globals and main functions

The point is, that there are several possible problems.

But.

One the one hand they are not really problematic in my opinion (I do not
care if my function has potential access to objects outside of its
environment because this access is read-only at worst and it's not common
practice to use this potential anyways).
On the other hand I am not sure what the main()-idiom would actually add to
the table other than allowing for the dual use of function definition and
function execution code in the same script - which we agreed upon is bad
practice.

Best, Peter

Am Mi., 28. Aug. 2019 um 17:58 Uhr schrieb Cyclic Group Z_1 <
cyclicgroup-z1 at yahoo.com>:
> I appreciate the well-thought-out comments.
>
> To your first point, I am not sure what "glattering" means
precisely (a
> Google search revealed nothing useful), but I assume it means something to
> the effect of overfilling the main namespace with too many names. Per Norm
> Matloff's counterpoint in The Art of R Programming regarding this
issue,
> this is mostly avoided by well-defined, (sufficiently) long names. Also,
> when a program is properly modularized, one generally wouldn't have
this
> many objects at the same time unless the complexity of a program demands
> it. You can, for example, use named function scope outside main or
> anonymous functions to limit variable scope to operations that need a given
> variable. Using main() with any named functions closely tied to a script
> defined outside it actually addresses this "glattering namespace"
issue,
> since, if we treat the global scope as a main function instead of using a
> main() idiom, any functions that are defined in global scope will contain
> all global variables within its search path. Alternatively, one can put all
> named functions in a package; in some cases, however, it will make more
> sense to keep a function defined within the script. Unless you never
> modularize your code into functions and flatten everything out into a
> common namespace, using main would be helpful to avoid
> namespace-glattering. Maybe I'm missing something, but I'm not sure
how
> namespace-glattering favors not using a main() idiom, since avoiding
> globals doesn't mean not structuring your code properly; it actually
seems
> to favor using main(). Given any properly structured program (organizing
> functions as needed), the implementation that puts all variables into the
> global workspace (same as the top-level functions) will be less safe since
> all functions will contain all globals within its search path. (Unless, of
> course, every single function is put into a package).
>
> To your second point, I agree that many of the issues associated with
> global state/environment are generally less problematic when using pure (or
> as pure as possible) functions. On a related note, lexically scoped
> functional languages (especially pure functional ones) generally encourage
> modularizing everything into functions, rather than having a lot of objects
> exposed to the top level (not to say that globals are not used, only that
> they are not the default choice). So the typical R way of doing this tends
> to disagree with how things are normally done in functional programming.
> Chopping our code into well-abstracted functions (and therefore namespaces)
> is the functional way to do things and helps to minimize the state to which
> any particular function has access. Organizing the functions we want to be
> pure so that they are not defined in the same environment in which they are
> called actually helps to ensure function purity in the input direction,
> since those functions will not have lexical-scope access to called
> variables. (That is, you may have written an impure function without
> realizing it; organizing functions so they are not defined in the same
> environment as when they are called helps to ensure purity.)
>
> Perhaps I am mistaken, but in either case, your points actually favor a
> main() idiom, unless you take using main() to mean using main() with extra
> bits (e.g., flattening your code structure).
>
> Admittedly, putting every single function into a package and not having
> any named functions in your script generally addresses all of these issues.
>
> Best,
> CG
>
	[[alternative HTML version deleted]]

Cyclic Group Z_1

2019-Aug-28 16:31 UTC

head link

[Rd] Conventions: Use of globals and main functions

I meant that using a script both as a script and a library (sourcing into other
files to serve as a package) is bad practice. I don't think having any
functions in a script is necessarily bad practice.

Best,
CG

Reasonably Related Threads

Search for more possibly parallel threads

R devel - Aug 2019 - Conventions: Use of globals and main functions

[Rd] Conventions: Use of globals and main functions

[Rd] Conventions: Use of globals and main functions

[Rd] Conventions: Use of globals and main functions

Reasonably Related Threads