thr3ads.net - R devel - [Rd] names function for environments? [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Michael Lawrence

2015-Jan-27 15:26 UTC

[Rd] names function for environments?

I think ls(, sort=FALSE) would be more explicit and thus clearer. There is
much precedent for having arguments that request less work to be done e.g.
unlist(use.names=FALSE).  Yes, the extra typing is a bit painful, but there
is no intuitive reason why names() would be unsorted, while ls() would be
sorted. While it is tempting to use an existing function for this, the word
"names" is somewhat loaded. For example, one might expect
identical(names(env), names(as.list(env))) to be TRUE. I see no problem
with making names() a simple alias of ls(), as long as the behavior is the
same. Maybe a different name would be less "loaded" and imply lack of
order, something like keySet(). But do we really need this?






On Tue, Jan 27, 2015 at 7:11 AM, Martin Maechler <
maechler at lynne.stat.math.ethz.ch> wrote:
> >>>>> Peter Haverty <haverty.peter at gene.com>
> >>>>>     on Sun, 25 Jan 2015 12:21:04 -0800 writes:
>
>     > Hi all,
>     > The "ls" function wears two hats. It allows users to
inspect an
>     > environment interactively and also serves deeper in code as the
>     > accessor for an environment's names/keys. I propose that we
separate
>     > these two conflicting goals, keeping ls for interactive use and
> adding
>     > names for a quick listing of the hash keys. This involves adding
two
>     > lines to do_names in attrib.c.
>
>     > The 'ls' function and its 'objects' synonym appear
very frequently in
>     > performance-critical code like base/R/namespace.R and throughout
the
>     > methods package. These functions are currently among the major
>     > contributors to execution time in package loading.
>
>     > This two-line addition to attrib.c gives a significant speedup for
>     > listing an environment's names/keys (2-60X depending on the
'sorted'
>     > argument). It also simplifies the environment API by making it
more
>     > like the other basic types. We already have $ and [[ after all.
>
>     > Rather than sprinkling sorted=FALSE throughout the methods and
base
>     > code, let's use names.
>
> as for list()s and other (generalized) vectors.
>
> This sounds appealing at first, and I have heard/seen others propose
> it.  I see one good reason *not* to allow it (and you mention the
> reason by mentioning 'sorted') :
>
> The contents of an environment are inherently unordered, and
> even if the order stays fixed for a while, no code should rely
> on the ordering of the objects, and for that reason,
>  <env>[1]  etc do not make sense and are not allowed.
>
>     > Would you be open to this change?
>
> I'm undecided currently:
>  "-": reason above;
>  "+": convenience, compacter R code using it;
>       very simple and natural change to src/main/attrib.c
>
> and waiting for other comments, not the least from other members of R core
> ..
>
> Martin Maechler, ETH Zurich
>
>
>     > I have submitted a patch and some timings to the bug tracker as
>     > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170
>
>     > Regards,
>     > Pete
>
>     > ____________________
>     > Peter M. Haverty, Ph.D.
>     > Genentech, Inc.
>     > phaverty at gene.com
>
>     > ______________________________________________
>     > R-devel at r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Peter Haverty

2015-Jan-27 15:44 UTC

head link

[Rd] names function for environments?

I think that the "sorted" and "all.names" arguments are
really only
appropriate for pretty printing to the screen. I think it is a bit
unfortunate that environments have a names accessor that is 60X slower
than all the other types. This is likely due to the history of
environments, which were originally just for behind-the-scenes tasks.

Now that users can use environments as hashes, we really need
something like a "keys" function. We don't want programmers
depending
on the sorted-ness, as Martin mentioned.  Also, I think it helps users
when objects share as many of the key API functions as possible.
"names" is natural. "ls" was certainly confusing for me when
I
started. Having to supply two additional arguments to get the desired
output doesn't help there.  Think of all the perl programmers
struggling to switch to R.  Let's help them out.
Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com


On Tue, Jan 27, 2015 at 7:26 AM, Michael Lawrence
<lawrence.michael at gene.com> wrote:> I think ls(, sort=FALSE) would be more explicit and thus clearer. There is
> much precedent for having arguments that request less work to be done e.g.
> unlist(use.names=FALSE).  Yes, the extra typing is a bit painful, but there
> is no intuitive reason why names() would be unsorted, while ls() would be
> sorted. While it is tempting to use an existing function for this, the word
> "names" is somewhat loaded. For example, one might expect
> identical(names(env), names(as.list(env))) to be TRUE. I see no problem
with
> making names() a simple alias of ls(), as long as the behavior is the same.
> Maybe a different name would be less "loaded" and imply lack of
order,
> something like keySet(). But do we really need this?
>
>
>
>
>
>
> On Tue, Jan 27, 2015 at 7:11 AM, Martin Maechler
> <maechler at lynne.stat.math.ethz.ch> wrote:
>>
>> >>>>> Peter Haverty <haverty.peter at gene.com>
>> >>>>>     on Sun, 25 Jan 2015 12:21:04 -0800 writes:
>>
>>     > Hi all,
>>     > The "ls" function wears two hats. It allows users to
inspect an
>>     > environment interactively and also serves deeper in code as
the
>>     > accessor for an environment's names/keys. I propose that
we separate
>>     > these two conflicting goals, keeping ls for interactive use
and
>> adding
>>     > names for a quick listing of the hash keys. This involves
adding two
>>     > lines to do_names in attrib.c.
>>
>>     > The 'ls' function and its 'objects' synonym
appear very frequently
>> in
>>     > performance-critical code like base/R/namespace.R and
throughout the
>>     > methods package. These functions are currently among the major
>>     > contributors to execution time in package loading.
>>
>>     > This two-line addition to attrib.c gives a significant speedup
for
>>     > listing an environment's names/keys (2-60X depending on
the 'sorted'
>>     > argument). It also simplifies the environment API by making it
more
>>     > like the other basic types. We already have $ and [[ after
all.
>>
>>     > Rather than sprinkling sorted=FALSE throughout the methods and
base
>>     > code, let's use names.
>>
>> as for list()s and other (generalized) vectors.
>>
>> This sounds appealing at first, and I have heard/seen others propose
>> it.  I see one good reason *not* to allow it (and you mention the
>> reason by mentioning 'sorted') :
>>
>> The contents of an environment are inherently unordered, and
>> even if the order stays fixed for a while, no code should rely
>> on the ordering of the objects, and for that reason,
>>  <env>[1]  etc do not make sense and are not allowed.
>>
>>     > Would you be open to this change?
>>
>> I'm undecided currently:
>>  "-": reason above;
>>  "+": convenience, compacter R code using it;
>>       very simple and natural change to src/main/attrib.c
>>
>> and waiting for other comments, not the least from other members of R
core
>> ..
>>
>> Martin Maechler, ETH Zurich
>>
>>
>>     > I have submitted a patch and some timings to the bug tracker
as
>>     > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170
>>
>>     > Regards,
>>     > Pete
>>
>>     > ____________________
>>     > Peter M. Haverty, Ph.D.
>>     > Genentech, Inc.
>>     > phaverty at gene.com
>>
>>     > ______________________________________________
>>     > R-devel at r-project.org mailing list
>>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

Michael Lawrence

2015-Jan-27 15:59 UTC

head link

[Rd] names function for environments?

Since the contract of ls() is to sort, there is nothing wrong with
programmers depending on it. And there are many functions that could be
made 60X faster, but is it worth it? But I did notice that
as.list.environment has a sorted=FALSE argument already, so I guess
identical(names(x), names(as.list(x))) could be made to be TRUE, assuming
the order is at least persistent, if undefined, so that is a nice property.
I guess I'm OK it with.



On Tue, Jan 27, 2015 at 7:44 AM, Peter Haverty <haverty.peter at gene.com>
wrote:
> I think that the "sorted" and "all.names" arguments are
really only
> appropriate for pretty printing to the screen. I think it is a bit
> unfortunate that environments have a names accessor that is 60X slower
> than all the other types. This is likely due to the history of
> environments, which were originally just for behind-the-scenes tasks.
>
> Now that users can use environments as hashes, we really need
> something like a "keys" function. We don't want programmers
depending
> on the sorted-ness, as Martin mentioned.  Also, I think it helps users
> when objects share as many of the key API functions as possible.
> "names" is natural. "ls" was certainly confusing for me
when I
> started. Having to supply two additional arguments to get the desired
> output doesn't help there.  Think of all the perl programmers
> struggling to switch to R.  Let's help them out.
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phaverty at gene.com
>
>
> On Tue, Jan 27, 2015 at 7:26 AM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
> > I think ls(, sort=FALSE) would be more explicit and thus clearer.
There
> is
> > much precedent for having arguments that request less work to be done
> e.g.
> > unlist(use.names=FALSE).  Yes, the extra typing is a bit painful, but
> there
> > is no intuitive reason why names() would be unsorted, while ls() would
be
> > sorted. While it is tempting to use an existing function for this, the
> word
> > "names" is somewhat loaded. For example, one might expect
> > identical(names(env), names(as.list(env))) to be TRUE. I see no
problem
> with
> > making names() a simple alias of ls(), as long as the behavior is the
> same.
> > Maybe a different name would be less "loaded" and imply lack
of order,
> > something like keySet(). But do we really need this?
> >
> >
> >
> >
> >
> >
> > On Tue, Jan 27, 2015 at 7:11 AM, Martin Maechler
> > <maechler at lynne.stat.math.ethz.ch> wrote:
> >>
> >> >>>>> Peter Haverty <haverty.peter at
gene.com>
> >> >>>>>     on Sun, 25 Jan 2015 12:21:04 -0800
writes:
> >>
> >>     > Hi all,
> >>     > The "ls" function wears two hats. It allows
users to inspect an
> >>     > environment interactively and also serves deeper in code
as the
> >>     > accessor for an environment's names/keys. I propose
that we
> separate
> >>     > these two conflicting goals, keeping ls for interactive
use and
> >> adding
> >>     > names for a quick listing of the hash keys. This involves
adding
> two
> >>     > lines to do_names in attrib.c.
> >>
> >>     > The 'ls' function and its 'objects'
synonym appear very frequently
> >> in
> >>     > performance-critical code like base/R/namespace.R and
throughout
> the
> >>     > methods package. These functions are currently among the
major
> >>     > contributors to execution time in package loading.
> >>
> >>     > This two-line addition to attrib.c gives a significant
speedup for
> >>     > listing an environment's names/keys (2-60X depending
on the
> 'sorted'
> >>     > argument). It also simplifies the environment API by
making it
> more
> >>     > like the other basic types. We already have $ and [[
after all.
> >>
> >>     > Rather than sprinkling sorted=FALSE throughout the
methods and
> base
> >>     > code, let's use names.
> >>
> >> as for list()s and other (generalized) vectors.
> >>
> >> This sounds appealing at first, and I have heard/seen others
propose
> >> it.  I see one good reason *not* to allow it (and you mention the
> >> reason by mentioning 'sorted') :
> >>
> >> The contents of an environment are inherently unordered, and
> >> even if the order stays fixed for a while, no code should rely
> >> on the ordering of the objects, and for that reason,
> >>  <env>[1]  etc do not make sense and are not allowed.
> >>
> >>     > Would you be open to this change?
> >>
> >> I'm undecided currently:
> >>  "-": reason above;
> >>  "+": convenience, compacter R code using it;
> >>       very simple and natural change to src/main/attrib.c
> >>
> >> and waiting for other comments, not the least from other members
of R
> core
> >> ..
> >>
> >> Martin Maechler, ETH Zurich
> >>
> >>
> >>     > I have submitted a patch and some timings to the bug
tracker as
> >>     >
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170
> >>
> >>     > Regards,
> >>     > Pete
> >>
> >>     > ____________________
> >>     > Peter M. Haverty, Ph.D.
> >>     > Genentech, Inc.
> >>     > phaverty at gene.com
> >>
> >>     > ______________________________________________
> >>     > R-devel at r-project.org mailing list
> >>     > https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more maybe matching threads

R devel - Jan 2015 - names function for environments?

[Rd] names function for environments?

[Rd] names function for environments?

[Rd] names function for environments?

Reasonably Related Threads