Sergei Serdyuk
2006-Jun-15 20:17 UTC
[Ferret-talk] Finding out all terms from search results. How?
Hi everybody, I need to find out all terms (field values) from one of the fields from a set of documents returned by search. In other words, I have indexed documents with two fields. I do search on one field and then want to know all other field''s values from fount documents. How? -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com -- Posted via http://www.ruby-forum.com/.
Neville Burnell
2006-Jun-15 23:48 UTC
[Ferret-talk] Finding out all terms from search results. How?
How about something like this, where "field2" is the field you want to collect values = [] index.search_each(query) do |doc, score| values.push index[doc]["field2"] end -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of Sergei Serdyuk Sent: Friday, 16 June 2006 6:18 AM To: ferret-talk at rubyforge.org Subject: [Ferret-talk] Finding out all terms from search results. How? Hi everybody, I need to find out all terms (field values) from one of the fields from a set of documents returned by search. In other words, I have indexed documents with two fields. I do search on one field and then want to know all other field''s values from fount documents. How? -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com -- Posted via http://www.ruby-forum.com/. _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk
Sergei Serdyuk
2006-Jun-16 13:59 UTC
[Ferret-talk] Finding out all terms from search results. How?
Hi Neville, It would work for a small resultset, but that is not an assumption I would want to make. I hope there is a way to get this info from Ferret directly. Sergei. Neville Burnell wrote:> How about something like this, where "field2" is the field you want to > collect > > values = [] > index.search_each(query) do |doc, score| > values.push index[doc]["field2"] > end-- Posted via http://www.ruby-forum.com/.
Lee Marlow
2006-Jun-16 14:46 UTC
[Ferret-talk] Finding out all terms from search results. How?
Why would this only work for a small resultset? Are you looking for a list of terms from the other field as tokenized by ferret or for just the value you put in that field during indexing? -Lee On 6/16/06, Sergei Serdyuk <sergei at redleafsoft.com> wrote:> Hi Neville, > > It would work for a small resultset, but that is not an assumption I > would want to make. I hope there is a way to get this info from Ferret > directly. > > Sergei. > > > > > Neville Burnell wrote: > > How about something like this, where "field2" is the field you want to > > collect > > > > values = [] > > index.search_each(query) do |doc, score| > > values.push index[doc]["field2"] > > end > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
Jeremy Bensley
2006-Jun-16 15:11 UTC
[Ferret-talk] Finding out all terms from search results. How?
While I don''t completely understand all contstraints, it seems as though a generalized version of Neville''s solution that goes through all fields in the document would work just fine. i.e. fields = [] index.search_each(query) do |doc, score| fields += doc.all_fields end values = fields.collect { |f| f.string_value } I don''t really know what part of ''Ferret doing this'' would be ... the information would have to be stored and retrieved from the index. Please elaborate if we do not seem to completely understand the problem. On 6/16/06, Lee Marlow <lmarlow at yahoo.com> wrote:> > Why would this only work for a small resultset? Are you looking for a > list of terms from the other field as tokenized by ferret or for just > the value you put in that field during indexing? > > -Lee > > On 6/16/06, Sergei Serdyuk <sergei at redleafsoft.com> wrote: > > Hi Neville, > > > > It would work for a small resultset, but that is not an assumption I > > would want to make. I hope there is a way to get this info from Ferret > > directly. > > > > Sergei. > > > > > > > > > > Neville Burnell wrote: > > > How about something like this, where "field2" is the field you want to > > > collect > > > > > > values = [] > > > index.search_each(query) do |doc, score| > > > values.push index[doc]["field2"] > > > end > > > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060616/8cbefdeb/attachment.htm
Sergei Serdyuk
2006-Jun-16 16:45 UTC
[Ferret-talk] Finding out all terms from search results. How?
Let me illustrate my problem a bit more. There is an index with 1.2M books in it. Every book has category field and every book can be currently in stock, which is stored in stock field. Now, I generally expect to have 50-60% of books to be stocked. So it leaves me with 600,000 books I would need to iterate to find out what categories are currently stocked. It sounds like borderline task where one would think a database would be more appropriate, but ability to do advanced search over this collection of books is a top priority and database would not provide that. -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com -- Posted via http://www.ruby-forum.com/.
Sergei Serdyuk
2006-Jun-16 16:51 UTC
[Ferret-talk] Finding out all terms from search results. How?
I would think that it can provide a set of terms that are connected to a set of documents without pulling out those documents one by one. -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com> Jeremy Bensley wrote: > I don''t really know what part of ''Ferret doing this'' would be ... the > information would have to be stored and retrieved from the index. Please > elaborate if we do not seem to completely understand the problem.-- Posted via http://www.ruby-forum.com/.
Erik Hatcher
2006-Jun-16 16:54 UTC
[Ferret-talk] Finding out all terms from search results. How?
I''m not familiar enough with Ferret, but I do this sort filtering and set intersections with Java Lucene, primarily using Solr, from a Ruby on Rails front-end. I build up bit sets (using Solr''s new OpenBitSet class) that represent "all items collected" and apply that filter to searches and also intersect (using bit set ANDing) with other sets such as "all objects from 1861" and "all poetry genre objects", and so on. I''ve also customized Solr to return back facet counts, so given your example it could show how many books were in stock in each category and allow you to filter to see all those books easily too. Using these types of set intersection operations even bypasses the traditional Lucene search by simply dealing with efficiently structure sets of document id''s. Erik On Jun 16, 2006, at 12:45 PM, Sergei Serdyuk wrote:> Let me illustrate my problem a bit more. > > There is an index with 1.2M books in it. Every book has category field > and every book can be currently in stock, which is stored in stock > field. Now, I generally expect to have 50-60% of books to be > stocked. So > it leaves me with 600,000 books I would need to iterate to find out > what > categories are currently stocked. > > It sounds like borderline task where one would think a database > would be > more appropriate, but ability to do advanced search over this > collection > of books is a top priority and database would not provide that. > > -- > Sergei Serdyuk > Red Leaf Software LLC > web: http://redleafsoft.com > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
Sergei Serdyuk
2006-Jun-16 19:08 UTC
[Ferret-talk] Finding out all terms from search results. How?
Thank you Erik. It is not clear to me what it would look like in Ferret, but it sounds like a good direction to dig in.> Erik Hatcher wrote: > I''m not familiar enough with Ferret, but I do this sort filtering and > set intersections with Java Lucene, primarily using Solr, from a Ruby > on Rails front-end. > > I build up bit sets (using Solr''s new OpenBitSet class) that > represent "all items collected" and apply that filter to searches and > also intersect (using bit set ANDing) with other sets such as "all > objects from 1861" and "all poetry genre objects", and so on. I''ve > also customized Solr to return back facet counts, so given your > example it could show how many books were in stock in each category > and allow you to filter to see all those books easily too. Using > these types of set intersection operations even bypasses the > traditional Lucene search by simply dealing with efficiently > structure sets of document id''s. > > Erik-- Posted via http://www.ruby-forum.com/.
Erik Hatcher
2006-Jun-16 20:45 UTC
[Ferret-talk] Finding out all terms from search results. How?
On Jun 16, 2006, at 3:08 PM, Sergei Serdyuk wrote:> Thank you Erik. It is not clear to me what it would look like in > Ferret, > but it sounds like a good direction to dig in.In Java, building up such filters is done with code like this: TermEnum termEnum = reader.terms(new Term(field, "")); while (true) { Term term = termEnum.term(); if (term == null || !term.field().equals(field)) break; termDocs.seek(term); OpenBitSet bitSet = new OpenBitSet(reader.numDocs()); while (termDocs.next()) { bitSet.set(termDocs.doc()); } // ... cache bitSet for future use ... if (! termEnum.next()) break; } Ferret has a comparable API underneath that should make this sort of thing feasible in pure Ruby somehow. Erik> >> Erik Hatcher wrote: >> I''m not familiar enough with Ferret, but I do this sort filtering and >> set intersections with Java Lucene, primarily using Solr, from a Ruby >> on Rails front-end. >> >> I build up bit sets (using Solr''s new OpenBitSet class) that >> represent "all items collected" and apply that filter to searches and >> also intersect (using bit set ANDing) with other sets such as "all >> objects from 1861" and "all poetry genre objects", and so on. I''ve >> also customized Solr to return back facet counts, so given your >> example it could show how many books were in stock in each category >> and allow you to filter to see all those books easily too. Using >> these types of set intersection operations even bypasses the >> traditional Lucene search by simply dealing with efficiently >> structure sets of document id''s. >> >> Erik > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
David Balmain
2006-Jun-17 00:27 UTC
[Ferret-talk] Finding out all terms from search results. How?
On 6/17/06, Erik Hatcher <erik at ehatchersolutions.com> wrote:> On Jun 16, 2006, at 3:08 PM, Sergei Serdyuk wrote: > > Thank you Erik. It is not clear to me what it would look like in > > Ferret, > > but it sounds like a good direction to dig in. > > In Java, building up such filters is done with code like this: > > TermEnum termEnum = reader.terms(new Term(field, "")); > while (true) { > Term term = termEnum.term(); > if (term == null || !term.field().equals(field)) break; > > termDocs.seek(term); > OpenBitSet bitSet = new OpenBitSet(reader.numDocs()); > while (termDocs.next()) { > bitSet.set(termDocs.doc()); > } > > // ... cache bitSet for future use ... > > if (! termEnum.next()) break; > } > > Ferret has a comparable API underneath that should make this sort of > thing feasible in pure Ruby somehow.It is similar in Ferret. Have a look here to see the solution to a similar problem; http://www.ruby-forum.com/topic/56232#40931 Hope that helps. Cheers, Dave