Stian Grytøyr
2007-Apr-28 10:26 UTC
[Ferret-talk] Determine how many documents a term occurs in
Is there a fast way to determine how many documents a term occurs in, besides iterating through every document with TermDocEnum? -- Best regards, Stian Gryt?yr
John Leach
2007-Apr-28 17:14 UTC
[Ferret-talk] Determine how many documents a term occurs in
Hi, when you do a search, the array you get back is actually a TopDocs object, which has a total_hits value. http://ferret.davebalmain.com/api/classes/Ferret/Search/TopDocs.html I guess you''d just need to make sure all the fuzzy search stuff is off. or am I missing something here? John. On Sat, 2007-04-28 at 12:26 +0200, Stian Gryt?yr wrote:> Is there a fast way to determine how many documents a term occurs in, > besides iterating through every document with TermDocEnum? >-- http://johnleach.co.uk
Stian Grytøyr
2007-Apr-28 17:45 UTC
[Ferret-talk] Determine how many documents a term occurs in
On 4/28/07, John Leach <john at johnleach.co.uk> wrote:> when you do a search, the array you get back is actually a TopDocs > object, which has a total_hits value.Sorry, my question was imprecise. I meant how many documents in the entire corpus (or index), not for a particular query. -- Best regards, Stian Gryt?yr
John Leach
2007-Apr-29 20:39 UTC
[Ferret-talk] Determine how many documents a term occurs in
Ah ok, then IndexWriter.doc_count http://ferret.davebalmain.com/api/classes/Ferret/Index/IndexWriter.html#M000089 so something like: myindex.writer.doc_count John. On Sat, 2007-04-28 at 19:45 +0200, Stian Gryt?yr wrote:> On 4/28/07, John Leach <john at johnleach.co.uk> wrote: > > > when you do a search, the array you get back is actually a TopDocs > > object, which has a total_hits value. > > Sorry, my question was imprecise. I meant how many documents in the > entire corpus (or index), not for a particular query. >-- http://johnleach.co.uk
Stian Grytøyr
2007-Apr-29 20:59 UTC
[Ferret-talk] Determine how many documents a term occurs in
On 4/29/07, John Leach <john at johnleach.co.uk> wrote:> then IndexWriter.doc_count > > http://ferret.davebalmain.com/api/classes/Ferret/Index/IndexWriter.html#M000089 > > so something like: myindex.writer.doc_countThanks, but I still don''t think we''re quite there. I''m looking for the number of documents (in the index) that, say, "foo" occurs in. -- Best regards, Stian Gryt?yr
John Leach
2007-Apr-29 21:49 UTC
[Ferret-talk] Determine how many documents a term occurs in
Hi Stian, then I''m confused, because what you''re describing is the total hits of a one term search. You just need to watch out for fuzziness, like case sensitivity. But an alternative is to use the TermEnum methods, but they are done for one field at a time: http://ferret.davebalmain.com/api/classes/Ferret/Index/TermEnum.html something like: te = index_reader.terms(:content) te.skip_to("monkey") puts "The term ''monkey'' occurs in #{te.doc_freq} documents in the index" Am I warmer? ;) John. On Sun, 2007-04-29 at 22:59 +0200, Stian Gryt?yr wrote:> On 4/29/07, John Leach <john at johnleach.co.uk> wrote: > > > then IndexWriter.doc_count > > > > http://ferret.davebalmain.com/api/classes/Ferret/Index/IndexWriter.html#M000089 > > > > so something like: myindex.writer.doc_count > > Thanks, but I still don''t think we''re quite there. I''m looking for the number > of documents (in the index) that, say, "foo" occurs in. >-- http://johnleach.co.uk
Stian Grytøyr
2007-Apr-30 11:23 UTC
[Ferret-talk] Determine how many documents a term occurs in
On 4/29/07, John Leach <john at johnleach.co.uk> wrote:> then I''m confused, because what you''re describing is the total hits of a > one term search. You just need to watch out for fuzziness, like case > sensitivity.Aha, I finally get it. I dismissed that option right away, thinking that since I need to look up the total number of occurences for quite a few terms for each search, a full search for each term would become way too slow as the index grew. But I see now that that''s not the case, so this looks like a good solution. Thanks, John! -- Best regards, Stian Gryt?yr