James Kim
2007-Apr-03 10:15 UTC
[Ferret-talk] How can I count frequency of terms in a document?
Hi, there. I need some help. Is there a way to count frequencies of terms in a document on Ferret? I know that Ferret has IndexReader#terms_docs_for method which counts all documents. I need to count frequencies of terms in a specific document. Some way?? -- Posted via http://www.ruby-forum.com/.
Caleb Clausen
2007-Apr-03 19:43 UTC
[Ferret-talk] How can I count frequency of terms in a document?
James Kim wrote:> Is there a way to count frequencies of terms in a document on Ferret? > I know that Ferret has IndexReader#terms_docs_for method which counts > all documents. > I need to count frequencies of terms in a specific document.I believe that IndexReader#term_vector is the method that you''re looking for. This gives you some information about each term in one document... If you stored of positions when you indexed, the individual terms will have a list of positions associated. The size of that list is the term frequency.
David Balmain
2007-Apr-06 05:13 UTC
[Ferret-talk] How can I count frequency of terms in a document?
On 4/4/07, Caleb Clausen <caleb at inforadical.net> wrote:> James Kim wrote: > > Is there a way to count frequencies of terms in a document on Ferret? > > I know that Ferret has IndexReader#terms_docs_for method which counts > > all documents. > > I need to count frequencies of terms in a specific document. > > I believe that IndexReader#term_vector is the method that you''re looking > for. This gives you some information about each term in one document... > If you stored of positions when you indexed, the individual terms will > have a list of positions associated. The size of that list is the term > frequency.This is definitely one way of doing it. You can also find the frequency without storing term-vectors. Simply use the TermDocEnum and skip to the document you are interested. tde = index.reader.term_docs_for(:field, ''term'') tde.skip_to(100) # now check that we are at the correct document. If there are no # instances of ''term'' in document 100 then it will skip to the next # document with an instance of the term ''term'' frequency = tde.doc == 100 ? tde.freq : 0 puts "frequency of field:term in document 100 is #{frequency}" Here is a full working example; require ''rubygems'' require ''ferret'' index = Ferret::I.new index << ''one'' index << ''one two one three one four one'' # doc 1 index << ''one'' index << ''no 1s'' # doc 3 index << ''one'' def get_frequency(index, doc_num, term, field = :id) tde = index.reader.term_docs_for(field, term) tde.skip_to(doc_num) return tde.doc == doc_num ? tde.freq : 0 end puts get_frequency(index, 1, ''one'') #=> 4 puts get_frequency(index, 3, ''one'') #=> 0 -- Dave Balmain http://www.davebalmain.com/