thr3ads.net - R help - [R] Looking for Feature Hashing [Oct 2014]

If this information is useful, please help other people find it:
Share via:

Wush Wu

2014-Oct-25 09:25 UTC

[R] Looking for Feature Hashing

Dear all,

Sorry that I am not sure that whether I should ask the question here or
R-devel. Is there any existed packages which implements or is implementing
feature hashing or similar function?

For who does not know "feature hashing", please let me give a brief
explanation here.

Feature hashing is a technique to convert a large amount of string to dummy
variables quickly( similar to `stats::contrasts` ). For example, if I want
to convert a character vector `x <- c("asdfa",
"adsfausd", .....)` to dummy
variable, I need to construct a mapping between the string and the index
(`base::factor`). However, if the `x` has lots of different elements and
the size of `x` is huge, the overhead of constructing index is large.
Moreover, the overhead is larger for the distributed environment.

A good hashing function could be used to map the string to the index
quickly without the overhead of constructing the index. The probability of
"collision" might be small if we pick a good hashing function. For
details,
please see en.wikipedia.org/wiki/Feature_hashing

Best,
Wush Wu
PhD Student Graduate Institute of Electrical Engineering, National Taiwan
University

	[[alternative HTML version deleted]]

Duncan Murdoch

2014-Oct-26 10:43 UTC

head link

[R] Looking for Feature Hashing

On 25/10/2014, 5:25 AM, Wush Wu wrote:> Dear all,
> 
> Sorry that I am not sure that whether I should ask the question here or
> R-devel. Is there any existed packages which implements or is implementing
> feature hashing or similar function?
> 
> For who does not know "feature hashing", please let me give a
brief
> explanation here.
> 
> Feature hashing is a technique to convert a large amount of string to dummy
> variables quickly( similar to `stats::contrasts` ). For example, if I want
> to convert a character vector `x <- c("asdfa",
"adsfausd", .....)` to dummy
> variable, I need to construct a mapping between the string and the index
> (`base::factor`). However, if the `x` has lots of different elements and
> the size of `x` is huge, the overhead of constructing index is large.
> Moreover, the overhead is larger for the distributed environment.
> 
> A good hashing function could be used to map the string to the index
> quickly without the overhead of constructing the index. The probability of
> "collision" might be small if we pick a good hashing function.
For details,
> please see en.wikipedia.org/wiki/Feature_hashing
The "digest" package implements several different hash functions.  You
could use the hash values as names in an environment to index arbitrary
objects associated with the values.

Duncan Murdoch

R help - Oct 2014 - Looking for Feature Hashing

[R] Looking for Feature Hashing

[R] Looking for Feature Hashing