thr3ads.net - R help - [R] Compact Patricia Trees (Tries) [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Richard R. Liu

2010-Apr-29 08:05 UTC

[R] Compact Patricia Trees (Tries)

I have an application that a long list of character strings to determine which
occur at the beginning of a given word. ?A straight forward R script using grep
takes a long time to run. ?Rewriting it to use substr and match might be an
option, but I have the impression that preparing the list as a trie and
performing trie searches might lead to dramatic improvements in performance. 


I have searched the CRAN packages and find no packages that support Compact
Patricia Trees. ?Does anybody know of such?


Thanks,
Richard

Richard R. Liu
richard.liu at pueo-owl.ch

Gabor Grothendieck

2010-Apr-29 11:06 UTC

head link

[R] Compact Patricia Trees (Tries)

Using charmatch partial matches of 10,000 5 leters words to the same
list can be done in 10 seconds on my machine and 10,000 5 letter words
to 100,000 10 letter words in 1 minute.  Is that good enough?  Try
this simulation:

# generate N random words each k long
rwords <- function(N, k) {
   L <- sample(letters, N*k, replace = TRUE)
   apply(matrix(L, k), 2, paste, collapse = "")
}
w1 <- rwords(1e5, 10)
w2 <- rwords(1e4, 5)

system.time(charmatch(w2, w2))

system.time(charmatch(w2, w1))


On Thu, Apr 29, 2010 at 4:05 AM, Richard R. Liu <richard.liu at
pueo-owl.ch> wrote:> I have an application that a long list of character strings to determine
which
> occur at the beginning of a given word. ?A straight forward R script using
grep
> takes a long time to run. ?Rewriting it to use substr and match might be an
> option, but I have the impression that preparing the list as a trie and
> performing trie searches might lead to dramatic improvements in
performance.
>
>
> I have searched the CRAN packages and find no packages that support Compact
> Patricia Trees. ?Does anybody know of such?
>
>
> Thanks,
> Richard
>
> Richard R. Liu
> richard.liu at pueo-owl.ch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Apr 2010 - Compact Patricia Trees (Tries)

[R] Compact Patricia Trees (Tries)

[R] Compact Patricia Trees (Tries)