thr3ads.net - Ferret talk - [Ferret-talk] Getting non-stemmed terms from IndexReader [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Ted

2007-Mar-04 13:02 UTC

[Ferret-talk] Getting non-stemmed terms from IndexReader

I need to get a set of terms being indexed using Ferret. I used
IndexReader.terms and it returns a list of TermEnum nicely. The only
problem is that my analyzer includes a stemming filter.
So now, the terms I''m getting back are all stemmed. Is there anyway to
get the original unstemmed terms back from the index somehow? Thanks.

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2007-Mar-06 02:13 UTC

head link

[Ferret-talk] Getting non-stemmed terms from IndexReader

On 3/5/07, Ted <admin at mightytofu.com> wrote:> I need to get a set of terms being indexed using Ferret. I used
> IndexReader.terms and it returns a list of TermEnum nicely. The only
> problem is that my analyzer includes a stemming filter.
> So now, the terms I''m getting back are all stemmed. Is there
anyway to
> get the original unstemmed terms back from the index somehow? Thanks.
Hi Ted,

Unfortunately this isn''t really possible. What I''d recommend
is
indexing the field twice; once with a stemming analyzer and once
without. See PerFieldAnalyzer;

   
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html

Hope that helps.

Cheers,
Dave

-- 
Dave Balmain
http://www.davebalmain.com/

Ted

2007-Mar-06 02:46 UTC

head link

[Ferret-talk] Getting non-stemmed terms from IndexReader

Thanks for the response. This is exactly what I did... indexing the 
field twice and then have different analyzers for both.

David Balmain wrote:> On 3/5/07, Ted <admin at mightytofu.com> wrote:
>> I need to get a set of terms being indexed using Ferret. I used
>> IndexReader.terms and it returns a list of TermEnum nicely. The only
>> problem is that my analyzer includes a stemming filter.
>> So now, the terms I''m getting back are all stemmed. Is there
anyway to
>> get the original unstemmed terms back from the index somehow? Thanks.
> 
> Hi Ted,
> 
> Unfortunately this isn''t really possible. What I''d
recommend is
> indexing the field twice; once with a stemming analyzer and once
> without. See PerFieldAnalyzer;
> 
>    
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html
> 
> Hope that helps.
> 
> Cheers,
> Dave

-- 
Posted via http://www.ruby-forum.com/.

Ted

2007-Mar-06 02:58 UTC

head link

[Ferret-talk] Getting non-stemmed terms from IndexReader

I encountered another problem:

After I removed docs from the index, the doc_freq returned by 
IndexReader.terms is not updated. It always shows the old number or 
bigger number after more docs with that term is added.
So it looks like the doc_freq is not updated corrected on removal of a 
doc.

David Balmain wrote:> On 3/5/07, Ted <admin at mightytofu.com> wrote:
>> I need to get a set of terms being indexed using Ferret. I used
>> IndexReader.terms and it returns a list of TermEnum nicely. The only
>> problem is that my analyzer includes a stemming filter.
>> So now, the terms I''m getting back are all stemmed. Is there
anyway to
>> get the original unstemmed terms back from the index somehow? Thanks.
> 
> Hi Ted,
> 
> Unfortunately this isn''t really possible. What I''d
recommend is
> indexing the field twice; once with a stemming analyzer and once
> without. See PerFieldAnalyzer;
> 
>    
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/PerFieldAnalyzer.html
> 
> Hope that helps.
> 
> Cheers,
> Dave

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2007-Mar-06 03:25 UTC

head link

[Ferret-talk] Getting non-stemmed terms from IndexReader

On 3/6/07, Ted <admin at mightytofu.com> wrote:> I encountered another problem:
>
> After I removed docs from the index, the doc_freq returned by
> IndexReader.terms is not updated. It always shows the old number or
> bigger number after more docs with that term is added.
> So it looks like the doc_freq is not updated corrected on removal of a
> doc.
This is impossible to fix without ruining performance. To fix this
problem I would basically need to optimize the index after every
deletion. In fact, you can do this yourself if you like. Just optimize
the index whenever you need to rely on the doc frequency being correct
and you have possible deletions in the index.

Cheers,
Dave

-- 
Dave Balmain
http://www.davebalmain.com/

Ted

2007-Mar-06 08:24 UTC

head link

[Ferret-talk] Getting non-stemmed terms from IndexReader

Got it. I had thought that ''flush'' would do the trick, but i
guess not
so. I think I will have to call optimize but do so only when necessary 
then. Thanks for your response.

David Balmain wrote:> On 3/6/07, Ted <admin at mightytofu.com> wrote:
>> I encountered another problem:
>>
>> After I removed docs from the index, the doc_freq returned by
>> IndexReader.terms is not updated. It always shows the old number or
>> bigger number after more docs with that term is added.
>> So it looks like the doc_freq is not updated corrected on removal of a
>> doc.
> 
> This is impossible to fix without ruining performance. To fix this
> problem I would basically need to optimize the index after every
> deletion. In fact, you can do this yourself if you like. Just optimize
> the index whenever you need to rely on the doc frequency being correct
> and you have possible deletions in the index.
> 
> Cheers,
> Dave

-- 
Posted via http://www.ruby-forum.com/.

Reasonably Related Threads

Search for more reasonably related threads

Ferret talk - Mar 2007 - Getting non-stemmed terms from IndexReader

[Ferret-talk] Getting non-stemmed terms from IndexReader

[Ferret-talk] Getting non-stemmed terms from IndexReader

[Ferret-talk] Getting non-stemmed terms from IndexReader

[Ferret-talk] Getting non-stemmed terms from IndexReader

[Ferret-talk] Getting non-stemmed terms from IndexReader

[Ferret-talk] Getting non-stemmed terms from IndexReader

Reasonably Related Threads