Hi all, The dict package provides a dictionary (hashtable) data structure much like R's built-in environment objects, but with the following differences: - The Dict class can be subclassed. - Four different hashing functions are implemented and the user can specify which to use when creating an instance. I'm sending this here as opposed to R-packages because this package will only be of interest to developers and because I'd like to get feedback from a slightly smaller community before either putting it on CRAN or retiring it to /dev/null. The design makes it fairly easy to add additional hashing functions, although currently this must be done in C. If nothing else, this package should be useful for evaluating hashing functions (see the vignette for some examples). Source: R-2.6.x: http://userprimary.net/software/dict_0.1.0.tar.gz R-2.5.x: http://userprimary.net/software/dict_0.0.4.tar.gz Windows binary: R-2.5.x: http://userprimary.net/software/dict_0.0.4.zip + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org
Gabor Grothendieck
2007-Jul-22 00:43 UTC
[Rd] dict package: dictionary data structure for R
Although the proto package is not particularly aimed at hashing note that it covers some of the same ground and also is based on a well thought out object model (known as object-based programming or prototype programming). Here is an example where we create two proto objects (which could be regarded as hash tables) in which q is a child of p and so inherits a: library(proto) p <- proto(a = 1, b = 2) q <- p$proto(c = 3) q$a # 1 On 7/21/07, Seth Falcon <sfalcon at fhcrc.org> wrote:> Hi all, > > The dict package provides a dictionary (hashtable) data > structure much like R's built-in environment objects, but with the > following differences: > > - The Dict class can be subclassed. > > - Four different hashing functions are implemented and the user can > specify which to use when creating an instance. > > I'm sending this here as opposed to R-packages because this package > will only be of interest to developers and because I'd like to get > feedback from a slightly smaller community before either putting it on > CRAN or retiring it to /dev/null. > > The design makes it fairly easy to add additional hashing functions, > although currently this must be done in C. If nothing else, this > package should be useful for evaluating hashing functions (see the > vignette for some examples). > > Source: > R-2.6.x: http://userprimary.net/software/dict_0.1.0.tar.gz > R-2.5.x: http://userprimary.net/software/dict_0.0.4.tar.gz > > Windows binary: > R-2.5.x: http://userprimary.net/software/dict_0.0.4.zip > > > + seth > > -- > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center > http://bioconductor.org > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Duncan Temple Lang
2007-Jul-23 11:11 UTC
[Rd] dict package: dictionary data structure for R
Hi Seth. Glad you did this. As you know, I think we need more specialized data structures and the ability to be able to introduce them easily into R computations, both internally and at the R language-level. A few things that come to mind after a quick initial look. The HashFunc typedef in hashfuncs.h would be more flexible if it took an additional argument of type void * to allow for user defined data. Alternatively, it might take the hash table object itself. The function might want to do some updating of the table itself, or look at some table (e.g. for perfect hashing). And if we had a place to provide additional information, it is easy to allow the hash function object to be an R function. Also, you are using a "global" table of hash functions (i.e. Dict_HashFunctions) and looking up the C routine using GET_HASHFUN which is tied to the integer indexing for this global table. Why not use the C routines directly from R, i.e. using getNativeSymbolInfo and pass this from R to the newly created dict. This avoids the lookup, the global table and makes things extensible with routines in packages and simply extends to allowing R functions to be passed instead of C routines. It also removes the need to synchronize the labeling system in R and in C, i.e. that 0L corresponds to PJW. The reliance on synchronized names rather than direct handles is unnecessary although widely used in S/R code. I'm more than happy to give some code to illustrate what I mean more precisely if you'd like it. D. Seth Falcon wrote:> Hi all, > > The dict package provides a dictionary (hashtable) data > structure much like R's built-in environment objects, but with the > following differences: > > - The Dict class can be subclassed. > > - Four different hashing functions are implemented and the user can > specify which to use when creating an instance. > > I'm sending this here as opposed to R-packages because this package > will only be of interest to developers and because I'd like to get > feedback from a slightly smaller community before either putting it on > CRAN or retiring it to /dev/null. > > The design makes it fairly easy to add additional hashing functions, > although currently this must be done in C. If nothing else, this > package should be useful for evaluating hashing functions (see the > vignette for some examples). > > Source: > R-2.6.x: http://userprimary.net/software/dict_0.1.0.tar.gz > R-2.5.x: http://userprimary.net/software/dict_0.0.4.tar.gz > > Windows binary: > R-2.5.x: http://userprimary.net/software/dict_0.0.4.zip > > > + seth >