Hi all, The "ls" function wears two hats. It allows users to inspect an environment interactively and also serves deeper in code as the accessor for an environment's names/keys. I propose that we separate these two conflicting goals, keeping ls for interactive use and adding names for a quick listing of the hash keys. This involves adding two lines to do_names in attrib.c. The 'ls' function and its 'objects' synonym appear very frequently in performance-critical code like base/R/namespace.R and throughout the methods package. These functions are currently among the major contributors to execution time in package loading. This two-line addition to attrib.c gives a significant speedup for listing an environment's names/keys (2-60X depending on the 'sorted' argument). It also simplifies the environment API by making it more like the other basic types. We already have $ and [[ after all. Rather than sprinkling sorted=FALSE throughout the methods and base code, let's use names. Would you be open to this change? I have submitted a patch and some timings to the bug tracker as https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170 Regards, Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phaverty at gene.com
>>>>> Peter Haverty <haverty.peter at gene.com> >>>>> on Sun, 25 Jan 2015 12:21:04 -0800 writes:> Hi all, > The "ls" function wears two hats. It allows users to inspect an > environment interactively and also serves deeper in code as the > accessor for an environment's names/keys. I propose that we separate > these two conflicting goals, keeping ls for interactive use and adding > names for a quick listing of the hash keys. This involves adding two > lines to do_names in attrib.c. > The 'ls' function and its 'objects' synonym appear very frequently in > performance-critical code like base/R/namespace.R and throughout the > methods package. These functions are currently among the major > contributors to execution time in package loading. > This two-line addition to attrib.c gives a significant speedup for > listing an environment's names/keys (2-60X depending on the 'sorted' > argument). It also simplifies the environment API by making it more > like the other basic types. We already have $ and [[ after all. > Rather than sprinkling sorted=FALSE throughout the methods and base > code, let's use names. as for list()s and other (generalized) vectors. This sounds appealing at first, and I have heard/seen others propose it. I see one good reason *not* to allow it (and you mention the reason by mentioning 'sorted') : The contents of an environment are inherently unordered, and even if the order stays fixed for a while, no code should rely on the ordering of the objects, and for that reason, <env>[1] etc do not make sense and are not allowed. > Would you be open to this change? I'm undecided currently: "-": reason above; "+": convenience, compacter R code using it; very simple and natural change to src/main/attrib.c and waiting for other comments, not the least from other members of R core .. Martin Maechler, ETH Zurich > I have submitted a patch and some timings to the bug tracker as > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170 > Regards, > Pete > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phaverty at gene.com > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
I think ls(, sort=FALSE) would be more explicit and thus clearer. There is much precedent for having arguments that request less work to be done e.g. unlist(use.names=FALSE). Yes, the extra typing is a bit painful, but there is no intuitive reason why names() would be unsorted, while ls() would be sorted. While it is tempting to use an existing function for this, the word "names" is somewhat loaded. For example, one might expect identical(names(env), names(as.list(env))) to be TRUE. I see no problem with making names() a simple alias of ls(), as long as the behavior is the same. Maybe a different name would be less "loaded" and imply lack of order, something like keySet(). But do we really need this? On Tue, Jan 27, 2015 at 7:11 AM, Martin Maechler < maechler at lynne.stat.math.ethz.ch> wrote:> >>>>> Peter Haverty <haverty.peter at gene.com> > >>>>> on Sun, 25 Jan 2015 12:21:04 -0800 writes: > > > Hi all, > > The "ls" function wears two hats. It allows users to inspect an > > environment interactively and also serves deeper in code as the > > accessor for an environment's names/keys. I propose that we separate > > these two conflicting goals, keeping ls for interactive use and > adding > > names for a quick listing of the hash keys. This involves adding two > > lines to do_names in attrib.c. > > > The 'ls' function and its 'objects' synonym appear very frequently in > > performance-critical code like base/R/namespace.R and throughout the > > methods package. These functions are currently among the major > > contributors to execution time in package loading. > > > This two-line addition to attrib.c gives a significant speedup for > > listing an environment's names/keys (2-60X depending on the 'sorted' > > argument). It also simplifies the environment API by making it more > > like the other basic types. We already have $ and [[ after all. > > > Rather than sprinkling sorted=FALSE throughout the methods and base > > code, let's use names. > > as for list()s and other (generalized) vectors. > > This sounds appealing at first, and I have heard/seen others propose > it. I see one good reason *not* to allow it (and you mention the > reason by mentioning 'sorted') : > > The contents of an environment are inherently unordered, and > even if the order stays fixed for a while, no code should rely > on the ordering of the objects, and for that reason, > <env>[1] etc do not make sense and are not allowed. > > > Would you be open to this change? > > I'm undecided currently: > "-": reason above; > "+": convenience, compacter R code using it; > very simple and natural change to src/main/attrib.c > > and waiting for other comments, not the least from other members of R core > .. > > Martin Maechler, ETH Zurich > > > > I have submitted a patch and some timings to the bug tracker as > > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16170 > > > Regards, > > Pete > > > ____________________ > > Peter M. Haverty, Ph.D. > > Genentech, Inc. > > phaverty at gene.com > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]