Hi everyone, There is no mention in ?new.env (R-1.6.0) of what the effect of setting the hash argument of new.env() actually does. What does it mean in performance terms to say that "the environment will be hashed"? Thanks, Jonathan. -- Jonathan Rougier Science Laboratories Department of Mathematical Sciences South Road University of Durham Durham DH1 3LE tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388 http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Jonathan Rougier <J.C.Rougier at durham.ac.uk> writes:> Hi everyone, > > There is no mention in ?new.env (R-1.6.0) of what the effect of setting > the hash argument of new.env() actually does. What does it mean in > performance terms to say that "the environment will be hashed"?It's difficult to say since there are various tradeoffs (especially between speed and space). Empirically, I seem to recall that we did some benchmarking and the effect of hashing was essentially nil for typical environments created by R function calls. However, that's not the only potential usage of environments in R; they can also be used for what other languages call associative arrays or hashes. For large environments, the potential speedup is from the O(N) complexity of a linear search to a constant-time lookup using a hash-key on a near empty table, but there's also an administrative overhead of re-hashing an environment if it outgrows its hashtable, etc., etc. There are several webpages that discuss hashing (although not all of them relate to this particular meaning of the word...) e.g. http://www-theory.dcs.st-and.ac.uk/~mda/cs2001/hashing/general.html -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
As Peter said, hashing tends to be effective if 1) you have lots of things (most R environments in practice have very few objects so hashing is not so important, but for base it is). 2) you access them individually. If for example you will always apply some operation to all the objects (or even most, I think) then a hash table doesn't really help and can hurt. But for some operations (especially in comp. bio.) they are very helpful and substantially reduce the time used to do lookups. On Tue, Oct 08, 2002 at 11:58:24AM +0100, Jonathan Rougier wrote:> Hi everyone, > > There is no mention in ?new.env (R-1.6.0) of what the effect of setting > the hash argument of new.env() actually does. What does it mean in > performance terms to say that "the environment will be hashed"? > > Thanks, Jonathan. > -- > Jonathan Rougier Science Laboratories > Department of Mathematical Sciences South Road > University of Durham Durham DH1 3LE > tel: +44 (0)191 374 2361, fax: +44 (0)191 374 7388 > http://www.maths.dur.ac.uk/stats/people/jcr/jcr.html > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- +---------------------------------------------------------------------------+ | Robert Gentleman phone : (617) 632-5250 | | Associate Professor fax: (617) 632-2444 | | Department of Biostatistics office: M1B20 | Harvard School of Public Health email: rgentlem at jimmy.dfci.harvard.edu | +---------------------------------------------------------------------------+ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 8 Oct 2002, Jonathan Rougier wrote:> Hi everyone, > > There is no mention in ?new.env (R-1.6.0) of what the effect of setting > the hash argument of new.env() actually does. What does it mean in > performance terms to say that "the environment will be hashed"? >If the environment has a lot of things in it then lookup should be faster with hashing -- it should be roughly independent of the number of things in the environment. I believe that in the bioconductor project this is actually noticeable when you have data on 5000 genes in an environment. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._