Hi I''m considering using Ferret in v2 of Weft QDA, a wxruby desktop application for textual analysis in social science. Ferret seems a very impressive package that meets and exceeds my requirements, but I can''t find how to retrieve specific details about the results. I''d like to be able to run fairly simple queries. I then need to look at each term match, and get its document id and the character (not byte) position at which it occurs in the source document. My semi-illiterate reading of search.c suggests this is available, but looking at the SearchHits returned by a SpanTermQuery, they don''t seem to contain the methods I''m looking for. Thanks for any help. alex
On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote:> Hi > > I''m considering using Ferret in v2 of Weft QDA, a wxruby desktop > application for textual analysis in social science. > > Ferret seems a very impressive package that meets and exceeds my > requirements, but I can''t find how to retrieve specific details about > the results. > > I''d like to be able to run fairly simple queries. I then need to look at > each term match, and get its document id and the character (not byte) > position at which it occurs in the source document. > > My semi-illiterate reading of search.c suggests this is available, but > looking at the SearchHits returned by a SpanTermQuery, they don''t seem > to contain the methods I''m looking for.Without fully understanding what you want to achieve, I guess TermVectors are what you''re looking for. I''m not sure if they''re working on characters or bytes, though. Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Jens Kraemer wrote:> On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote: > >> I''d like to be able to run fairly simple queries. I then need to look at >> each term match, and get its document id and the character (not byte) >> position at which it occurs in the source document. >> > Without fully understanding what you want to achieve, I guess > TermVectors are what you''re looking for.Thank you - that class has exactly the data I need. Is there any way to extract the individual TermVectors implied by a set of search results? #highlight seems to do this internally, but the only ruby way I''ve found to access TVs is via index.reader.term_vector(docid_id, :field). I''d like to be able to find the terms in results of eg a fuzzy or phrase search.> I''m not sure if they''re working > on characters or bytes, though. >Looks like bytes, but i can probably work round that. thanks alex
On Thu, Mar 29, 2007 at 07:28:36PM +0100, Alex Fenton wrote:> Jens Kraemer wrote: > > On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex Fenton wrote: > > > >> I''d like to be able to run fairly simple queries. I then need to look at > >> each term match, and get its document id and the character (not byte) > >> position at which it occurs in the source document. > >> > > Without fully understanding what you want to achieve, I guess > > TermVectors are what you''re looking for. > Thank you - that class has exactly the data I need. Is there any way to > extract the individual TermVectors implied by a set of search results? > > #highlight seems to do this internally, but the only ruby way I''ve found > to access TVs is via index.reader.term_vector(docid_id, :field). I''d > like to be able to find the terms in results of eg a fuzzy or phrase search.you will get the doc_ids back from your search, so wouldn''t it work to just do a search_each and retrieve the term vectors inside the block? index.search_each(query) do |doc_id, score| tv = index.reader.term_vector(doc_id, :field) end Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Jens Kraemer wrote:>> #highlight seems to do this internally, but the only ruby way I''ve found >> to access TVs is via index.reader.term_vector(docid_id, :field). I''d >> like to be able to find the terms in results of eg a fuzzy or phrase search. >> > > you will get the doc_ids back from your search, so wouldn''t it work to > just do a search_each and retrieve the term vectors inside the block? > > index.search_each(query) do |doc_id, score| > tv = index.reader.term_vector(doc_id, :field) > end >I''ll give it a try, but if it was a fuzzy match I''m not sure I would know the exact term that was matched. Similarly with a phrase match - think I would have to manually verify that a particular occurrence of one term met the phrase criteria. thanks alex