thr3ads.net - Ferret talk - [Ferret-talk] retrieving search result positions [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Alex Fenton

2007-Mar-28 18:30 UTC

[Ferret-talk] retrieving search result positions

Hi

I''m considering using Ferret in v2 of Weft QDA, a wxruby desktop 
application for textual analysis in social science.

Ferret seems a very impressive package that meets and exceeds my 
requirements, but I can''t find how to retrieve specific details about 
the results.

I''d like to be able to run fairly simple queries. I then need to look
at
each term match, and get its document id and the character (not byte) 
position at which it occurs in the source document.

My semi-illiterate reading of search.c suggests this is available, but 
looking at the SearchHits returned by a SpanTermQuery, they don''t seem 
to contain the methods I''m looking for.

Thanks for any help.

alex

Jens Kraemer

2007-Mar-29 08:11 UTC

head link

[Ferret-talk] retrieving search result positions

On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton
wrote:> Hi
> 
> I''m considering using Ferret in v2 of Weft QDA, a wxruby desktop 
> application for textual analysis in social science.
> 
> Ferret seems a very impressive package that meets and exceeds my 
> requirements, but I can''t find how to retrieve specific details
about
> the results.
> 
> I''d like to be able to run fairly simple queries. I then need to
look at
> each term match, and get its document id and the character (not byte) 
> position at which it occurs in the source document.
> 
> My semi-illiterate reading of search.c suggests this is available, but 
> looking at the SearchHits returned by a SpanTermQuery, they don''t
seem
> to contain the methods I''m looking for.
Without fully understanding what you want to achieve, I guess
TermVectors are what you''re looking for. I''m not sure if
they''re working
on characters or bytes, though.

Jens


-- 
Jens Kr?mer
webit! Gesellschaft f?r neue Medien mbH
Schnorrstra?e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer at webit.de | www.webit.de
 
Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Alex Fenton

2007-Mar-29 18:28 UTC

head link

[Ferret-talk] retrieving search result positions

Jens Kraemer wrote:> On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote:
>   
>> I''d like to be able to run fairly simple queries. I then need
to look at
>> each term match, and get its document id and the character (not byte) 
>> position at which it occurs in the source document.
>>     
> Without fully understanding what you want to achieve, I guess
> TermVectors are what you''re looking for. Thank you - that class has exactly the data I need. Is there any way to 
extract the individual TermVectors implied by a set of search results?

#highlight seems to do this internally, but the only ruby way I''ve
found
to access TVs is via index.reader.term_vector(docid_id, :field). I''d 
like to be able to find the terms in results of eg a fuzzy or phrase
search.> I''m not sure if they''re working
> on characters or bytes, though.
>   Looks like bytes, but i can probably work round that.

thanks
alex

Jens Kraemer

2007-Mar-30 07:39 UTC

head link

[Ferret-talk] retrieving search result positions

On Thu, Mar 29, 2007 at 07:28:36PM +0100, Alex Fenton
wrote:> Jens Kraemer wrote:
> > On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote:
> >   
> >> I''d like to be able to run fairly simple queries. I then
need to look at
> >> each term match, and get its document id and the character (not
byte)
> >> position at which it occurs in the source document.
> >>     
> > Without fully understanding what you want to achieve, I guess
> > TermVectors are what you''re looking for. 
> Thank you - that class has exactly the data I need. Is there any way to 
> extract the individual TermVectors implied by a set of search results?
> 
> #highlight seems to do this internally, but the only ruby way I''ve
found
> to access TVs is via index.reader.term_vector(docid_id, :field).
I''d
> like to be able to find the terms in results of eg a fuzzy or phrase
search.
you will get the doc_ids back from your search, so wouldn''t it work to
just do a search_each and retrieve the term vectors inside the block?

index.search_each(query) do |doc_id, score|
  tv = index.reader.term_vector(doc_id, :field)
end

Jens

-- 
Jens Kr?mer
webit! Gesellschaft f?r neue Medien mbH
Schnorrstra?e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer at webit.de | www.webit.de
 
Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Alex Fenton

2007-Mar-30 08:03 UTC

head link

[Ferret-talk] retrieving search result positions

Jens Kraemer wrote:>> #highlight seems to do this internally, but the only ruby way
I''ve found
>> to access TVs is via index.reader.term_vector(docid_id, :field).
I''d
>> like to be able to find the terms in results of eg a fuzzy or phrase
search.
>>     
>
> you will get the doc_ids back from your search, so wouldn''t it
work to
> just do a search_each and retrieve the term vectors inside the block?
>
> index.search_each(query) do |doc_id, score|
>   tv = index.reader.term_vector(doc_id, :field)
> end
>   I''ll give it a try, but if it was a fuzzy match I''m not sure I
would
know the exact term that was matched. Similarly with a phrase match - 
think I would have to manually verify that a particular occurrence of 
one term met the phrase criteria.

thanks
alex

Apparently Analagous Threads

Search for more reasonably related threads

Ferret talk - Mar 2007 - retrieving search result positions

[Ferret-talk] retrieving search result positions

[Ferret-talk] retrieving search result positions

[Ferret-talk] retrieving search result positions

[Ferret-talk] retrieving search result positions

[Ferret-talk] retrieving search result positions

Apparently Analagous Threads