thr3ads.net - Ferret talk - [Ferret-talk] indexing large tokens [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Justin Kan

2006-Jun-16 23:14 UTC

[Ferret-talk] indexing large tokens

Hi,

I''m using the StandardAnalyzer to build an index, and passing in
Documents
that have Fields that contain large tokens (22+ characters) interpersed with
normal English words. This seems to cause the IndexWriter to slow to a
crawl. Is this a known issue, or am I doing something wrong?

If this is a known issue I don''t have any problem just not indexing
tokens
longer than a certain length, but what''s the best way to eliminate
them?
Using a TokenFilter on my own Analyzer? Sorry for the newbish questions,
I''m
new to ferret having never used lucene. Thanks in advance,

Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20060616/7c7f8ae0/attachment-0001.htm

David Balmain

2006-Jun-17 00:07 UTC

head link

[Ferret-talk] indexing large tokens

On 6/17/06, Justin Kan <justin.kan at gmail.com>
wrote:> Hi,
>
> I''m using the StandardAnalyzer to build an index, and passing in
Documents
> that have Fields that contain large tokens (22+ characters) interpersed
with
> normal English words. This seems to cause the IndexWriter to slow to a
> crawl. Is this a known issue, or am I doing something wrong?
Hi Justin,

I haven''t come accross this problem? Are you on Windows by any chance?
Currently Ferret is just generally slow on Windows because it is pure
Ruby code. One problem large tokens may cause is the general increase
in the number of terms in the index which can slow down indexing a
little but it would surprise me if it was making a huge difference
unless there was a particularly large number of them.
> If this is a known issue I don''t have any problem just not
indexing tokens
> longer than a certain length, but what''s the best way to eliminate
them?
> Using a TokenFilter on my own Analyzer? Sorry for the newbish questions,
I''m
> new to ferret having never used lucene. Thanks in advance,
Yes, using a token filter will do the job. Have a look in the analysis
module of Ferret for some examples. I''d be interested to hear if it
makes any difference.

Cheers,
Dave

Justin Kan

2006-Jun-19 17:53 UTC

head link

[Ferret-talk] indexing large tokens

David,

I was running on Windows, and when I moved to linux the problem disappeared
(I''m assuming because linux automatically uses cferret?). Thanks for
the
help!

Justin


On 6/16/06, David Balmain < dbalmain.ml at gmail.com>
wrote:>
> On 6/17/06, Justin Kan < justin.kan at gmail.com> wrote:
> > Hi,
> >
> > I''m using the StandardAnalyzer to build an index, and passing
in
> Documents
> > that have Fields that contain large tokens (22+ characters)
interpersed
> with
> > normal English words. This seems to cause the IndexWriter to slow to a
> > crawl. Is this a known issue, or am I doing something wrong?
>
> Hi Justin,
>
> I haven''t come accross this problem? Are you on Windows by any
chance?
> Currently Ferret is just generally slow on Windows because it is pure
> Ruby code. One problem large tokens may cause is the general increase
> in the number of terms in the index which can slow down indexing a
> little but it would surprise me if it was making a huge difference
> unless there was a particularly large number of them.
>
> > If this is a known issue I don''t have any problem just not
indexing
> tokens
> > longer than a certain length, but what''s the best way to
eliminate them?
>
> > Using a TokenFilter on my own Analyzer? Sorry for the newbish
questions,
> I''m
> > new to ferret having never used lucene. Thanks in advance,
>
> Yes, using a token filter will do the job. Have a look in the analysis
> module of Ferret for some examples. I''d be interested to hear if
it
> makes any difference.
>
> Cheers,
> Dave
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

<justin at kiko.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20060619/f2f7932c/attachment.htm

Apparently Analagous Threads

Search for more apparently analagous threads

Ferret talk - Jun 2006 - indexing large tokens

[Ferret-talk] indexing large tokens

[Ferret-talk] indexing large tokens

[Ferret-talk] indexing large tokens

Apparently Analagous Threads