Neville Burnell
2006-Aug-28 07:26 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
Hi, I''m building a web server application using Ferret [thanks so much Dave], Mongrel and Camping which works fine servicing one request at a time, but serialises searches if more than one request arrives, so I''d like some advice please about the best way to use multiple readers and one writer. Some background ... query requests which in my case are always read only, arrive via Mongrel, which allocates a thread for each request. Should I create a new IndexReader for each request also, or can I use one IndexReader concurrently? Index updates on the other hand are coordinated by a special Update Thread which runs every 10 minutes or so. I''m guessing that the best approach is to create an IndexWriter for each update run, which can be closed and discarded at the end of the update run. Or can I close and reuse a single IndexWriter? I searched http://ferret.davebalmain.com/api for details on the MultiReader, but I couldn''t find any details. If someone could post a link to point me in the right direction that would be great. Thanks so much Neville
David Balmain
2006-Sep-01 10:18 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
On 8/28/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> Hi, > > I''m building a web server application using Ferret [thanks so much > Dave], Mongrel and Camping which works fine servicing one request at a > time, but serialises searches if more than one request arrives, so I''d > like some advice please about the best way to use multiple readers and > one writer. > > Some background ... query requests which in my case are always read > only, arrive via Mongrel, which allocates a thread for each request. > Should I create a new IndexReader for each request also, or can I use > one IndexReader concurrently?Creating a new reader per request is not a good idea since creating a new IndexReader is an expensive operation (although it has been significantly improved in version 0.10). A lot of data needs to be read into memory for fast access. In most situations the ideal solution is to have a single IndexReader per thread. You can have as many IndexReaders open on an index as your operating system will allow. The one situation where you might be better off using a single IndexReader is when you are relying on caching. Filters and Sorts are cached per IndexReader and Sorts in particular can take up a fair chunk of memory so if you have a large index (large as in number of documents, not size of data) then you may be better off with a single IndexReader. IndexReader is thread-safe so using it concurrently should be fine.> Index updates on the other hand are coordinated by a special Update > Thread which runs every 10 minutes or so. I''m guessing that the best > approach is to create an IndexWriter for each update run, which can be > closed and discarded at the end of the update run. Or can I close and > reuse a single IndexWriter?You can''t reuse an IndexWriter after it has been closed. But you can commit the changes to disk; writer.commit() IndexWriter#optimize will also commit all changes to disk as an optimal index but depending on the size of your index you may only want to call optimize once a day if at all. For a small index however, calling it every ten minutes is definitely possible.> I searched http://ferret.davebalmain.com/api for details on the > MultiReader, but I couldn''t find any details. If someone could post a > link to point me in the right direction that would be great.You can actually pass an array of readers as the first (only) parameter to IndexReader.new. reader = IndexReader.new([reader1, reader2, reader3]) In the current working version of Ferret you can also pass Directory objects or paths; iw = IndexReader.new([dir, dir2, dir3]) iw = IndexReader.new(["/path/to/index1", "/path/to/index2"]) wait for 10.2 for this functionality (and an update to include this info in the API docs). Cheers, Dave
Neville Burnell
2006-Sep-04 01:40 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
Thanks for your reply Dave,> The one situation where you might be better off using > a single IndexReader is when you are relying on caching. > Filters and Sorts are cached per IndexReader and Sorts > in particular can take up a fair chunk of memory so if > you have a large index (large as in number of documents, > not size of data) then you may be better off with a single > IndexReader. IndexReader is thread-safe so using it concurrently > should be fine.Just to clarify, I''m using Ferret::Index::Index concurrently at the moment, and I''m not getting concurrent searches via #search_each. IE, if a slow wild-card search arrives first, all subsequent searches wait until the wild-card search completes. So I guess #search_each is "synchronised"? Therefore to have multiple searches on an index concurrently, I really need an IndexReader per thread and I would need to manage a pool of reusable IndexReaders? Any pointers on how other web apps [not using Rails] handle multiple Ferret readers?> You can actually pass an array of readers as the first (only)parameter to> IndexReader.new. > > reader = IndexReader.new([reader1, reader2, reader3]) >Interesting ... I had a look, but I don''t really understand what this does? Would you elaborate please :D Thanks for your help, Neville
David Balmain
2006-Sep-04 04:05 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
On 9/4/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> Thanks for your reply Dave, > > > The one situation where you might be better off using > > a single IndexReader is when you are relying on caching. > > Filters and Sorts are cached per IndexReader and Sorts > > in particular can take up a fair chunk of memory so if > > you have a large index (large as in number of documents, > > not size of data) then you may be better off with a single > > IndexReader. IndexReader is thread-safe so using it concurrently > > should be fine. > > Just to clarify, I''m using Ferret::Index::Index concurrently at the > moment, and I''m not getting concurrent searches via #search_each. IE, if > a slow wild-card search arrives first, all subsequent searches wait > until the wild-card search completes. > > So I guess #search_each is "synchronised"?That''s correct. Otherwise it would be possible for the document IDs of the documents to change between the time the search is run and the time the document is referenced. For the benefit of those who don''t know this, document IDs are not constant. They represent the position of the document in the index. Think of it like an array. Let''s add 5 documents to the index. [0,1,2,3,4] Now let''s delete documents 1 and 2; [0,3,4] So document 4 now has a doc_id of 2. If this happened in the middle of a search you''d have a problem. So instead we synchronize the the Index#search and Index#search_each methods. Now this isn''t the case for Searcher#search and Searcher#search_each since the IndexReader that Searcher uses remains consistent so you should be able to use Searcher concurrently.> Therefore to have multiple searches on an index concurrently, I really > need an IndexReader per thread and I would need to manage a pool of > reusable IndexReaders?Using Ferret::Index::Index this would be true. But if performance is a concern you should definitely use a Ferret::Search::Searcher object instead anyway and you''ll be able to use it concurrently.> Any pointers on how other web apps [not using Rails] handle multiple > Ferret readers?Let us know if using the Searcher object isn''t adequate.> > You can actually pass an array of readers as the first (only) > parameter to > > IndexReader.new. > > > > reader = IndexReader.new([reader1, reader2, reader3]) > > > > Interesting ... I had a look, but I don''t really understand what this > does? Would you elaborate please :DA MultiReader object was initially what was used to read and search multiple indexes at a time. This functionality is now simply handled by the IndexReader object. There are several uses for this. One was to store each model in a separate index and you could then offer search across multiple models using a MultiReader. Another use-case might be to have multiple indexes to speed up indexing. If for example you are scraping websites it is a very good idea to have multiple scraping processes. The best way to do this is to have each process indexing to its own index. You could then search all indexes at once using a MultiReader or you could also merge all indexes into a single index. Hope that makes sense. Cheers, Dave
Neville Burnell
2006-Sep-06 05:06 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
> Otherwise it would be possible for the document IDs of the > documents to change between the time the search is run and > the time the document is referenced.Well, I started coding to use Searcher#search_each and found myself recoding most of the infrastructure of Index#search_each (and its friends) simply to avoid its @dir.synchronize when what you were saying above started to sink in. Ie, as I understand it, I can have concurrent searchers if the index is read-only but not if I have a writer. So while its possible to have multiple readers, 1 writer, the 1 writer requirement forces use of synchronized, which means that the readers must be serialised and not concurrent - is this correct? Kind Regards Neville -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Monday, 4 September 2006 2:05 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario On 9/4/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> Thanks for your reply Dave, > > > The one situation where you might be better off using a single > > IndexReader is when you are relying on caching. > > Filters and Sorts are cached per IndexReader and Sorts in particular> > can take up a fair chunk of memory so if you have a large index > > (large as in number of documents, not size of data) then you may be > > better off with a single IndexReader. IndexReader is thread-safe so > > using it concurrently should be fine. > > Just to clarify, I''m using Ferret::Index::Index concurrently at the > moment, and I''m not getting concurrent searches via #search_each. IE, > if a slow wild-card search arrives first, all subsequent searches wait> until the wild-card search completes. > > So I guess #search_each is "synchronised"?That''s correct. Otherwise it would be possible for the document IDs of the documents to change between the time the search is run and the time the document is referenced. For the benefit of those who don''t know this, document IDs are not constant. They represent the position of the document in the index. Think of it like an array. Let''s add 5 documents to the index. [0,1,2,3,4] Now let''s delete documents 1 and 2; [0,3,4] So document 4 now has a doc_id of 2. If this happened in the middle of a search you''d have a problem. So instead we synchronize the the Index#search and Index#search_each methods. Now this isn''t the case for Searcher#search and Searcher#search_each since the IndexReader that Searcher uses remains consistent so you should be able to use Searcher concurrently.> Therefore to have multiple searches on an index concurrently, I really> need an IndexReader per thread and I would need to manage a pool of > reusable IndexReaders?Using Ferret::Index::Index this would be true. But if performance is a concern you should definitely use a Ferret::Search::Searcher object instead anyway and you''ll be able to use it concurrently.> Any pointers on how other web apps [not using Rails] handle multiple > Ferret readers?Let us know if using the Searcher object isn''t adequate.> > You can actually pass an array of readers as the first (only) > parameter to > > IndexReader.new. > > > > reader = IndexReader.new([reader1, reader2, reader3]) > > > > Interesting ... I had a look, but I don''t really understand what this > does? Would you elaborate please :DA MultiReader object was initially what was used to read and search multiple indexes at a time. This functionality is now simply handled by the IndexReader object. There are several uses for this. One was to store each model in a separate index and you could then offer search across multiple models using a MultiReader. Another use-case might be to have multiple indexes to speed up indexing. If for example you are scraping websites it is a very good idea to have multiple scraping processes. The best way to do this is to have each process indexing to its own index. You could then search all indexes at once using a MultiReader or you could also merge all indexes into a single index. Hope that makes sense. Cheers, Dave _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk
Neville Burnell
2006-Sep-06 06:28 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
I''ve whipped up this script to demonstrate what I''m trying [and failing] to achieve. The idea is that thread t1 adds docs to the index over time, while threads t2 and t3 search the same index for the new docs. Unfortunately the script doesn''t work, as t2 and t3 don''t find the docs that t1 has added. Can anyone point out where I am going wrong. Thanks so much. Neville ================================ require ''rubygems'' require ''ferret'' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @writer = Ferret::Index::IndexWriter.new(:dir => @dir) @searcher = Ferret::Search::Searcher.new(@dir) @parser = Ferret::QueryParser.new @docs = [] @docs << {:id => 1, :name => ''Fred'', :occupation => ''Toon''} @docs << {:id => 2, :name => ''Barney'', :occupation => ''Toon''} @docs << {:id => 3, :name => ''Wilma'', :occupation => ''Toon''} @docs << {:id => 4, :name => ''Betty'', :occupation => ''Toon''} @docs << {:id => 5, :name => ''Pebbles'', :occupation => ''Toon''} @docs << {:id => 6, :name => ''Superman'', :occupation => ''Hero''} @docs << {:id => 7, :name => ''Batman'', :occupation => ''Hero''} @docs << {:id => 8, :name => ''Spiderman'', :occupation => ''Hero''} @docs << {:id => 9, :name => ''Green Lantern'', :occupation => ''Hero''} @docs << {:id => 10, :name => ''Dr Strange'', :occupation => ''Hero''} @docs << {:id => 11, :name => ''Phantom'', :occupation => ''Hero''} #populate index over time t1 = Thread.new do @docs.each do |doc| p "t1: adding #{doc[:id]} to index" @writer << doc sleep(10) end end #search for heroes over time t2 = Thread.new do query_txt = ''occupation:hero'' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t2: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 6 sleep(5) end end #search for toons over time t3 = Thread.new do query_txt = ''occupation:toon'' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t3: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 5 sleep(5) end end t1.join; t2.join; t3.join
David Balmain
2006-Sep-06 06:40 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
On 9/6/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> > Otherwise it would be possible for the document IDs of the > > documents to change between the time the search is run and > > the time the document is referenced. > > Well, I started coding to use Searcher#search_each and found myself > recoding most of the infrastructure of Index#search_each (and its > friends) simply to avoid its @dir.synchronize when what you were saying > above started to sink in. Ie, as I understand it, I can have concurrent > searchers if the index is read-only but not if I have a writer. > > So while its possible to have multiple readers, 1 writer, the 1 writer > requirement forces use of synchronized, which means that the readers > must be serialised and not concurrent - is this correct?Close, When you open an IndexReader on the index it is opened up on that particular version (or state) of the index. So any operations on the IndexReader (like searches) will only show what was in the index at the time you opened it. Any modifications to the index (usually through and IndexWriter) that occur after you open the IndexReader will not appear in your searches. So to keep searches up to date you need to close and reopen your IndexReader every time you commit changes to the index. So the writer doesn''t force the use of synchronized. Rather it forces you to decide whether searches need to return the most up to date results available or if there can be a short delay between changes being written to the index and changes appearing in the search results. The Index class makes it as simple as possible to always search the latest index but there is a performance hit. Most of the time performance should be fine. The Ferret C core has been highly optimized and will still beat most other solutions hands down, even when used in this way. Now, if I were writing an application where search performance is a big issue (as it seems to be in your case) then I would start by using the base classes like IndexReader and IndexWriter (as we''ve already discussed). Like I just mentioned you might allow a delay between the time the index is modified and the time those modifications appear in search results. This would allow you to update the IndexReader every minute/hour/day/week without regard to what the IndexWriter is doing. This solution works well when when scraping webpages. Google''s results, for example, aren''t always completely up to date with the pages they index. If one of their results is a dead link it isn''t the end of the world. If, however, you are indexing data in a database it often isn''t this simple. If you use the previous solution with a database that allows deletes then you need some way to handle results that reference objects that have been deleted from the database. Otherwise you will need some way to synchronize on the index (probably on the Ferret::Store::Directory like Ferret::Index::Index does) so that no searches are done while the deletion is committed to the index and the IndexReaders are updated. Another solution which I''m going to experiment with is using the index as your database. You may still keep your original database but store any data in the index that will be shown back to the user as the result of a search. That way you don''t need to worry about synchronization with the database. I don''t think I''ve explained this very clearly here so feel free to try and clarify. I will be endeavoring to write this all down more clear and comprehensible manner so that everyone can work out the solution that best fits their needs. Cheers, Dave PS: The ideal solution for me would be an object database with Ferret-like full-text search built in. I''ve been thinking about this a lot lately. It would certainly fit the style of development used in many Rails apps. That is to say, all access to the database must go through the model as that is where all the validation is. If you are developing this way, why bother with the relational database and ORM solution. A good object database would serve the same purpose and would be a LOT more performant. Obviously this solution wouldn''t be for everybody though so enterprise developers feel free to ignore. ;-)
David Balmain
2006-Sep-06 06:43 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
On 9/6/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> I''ve whipped up this script to demonstrate what I''m trying [and failing] > to achieve. The idea is that thread t1 adds docs to the index over time, > while threads t2 and t3 search the same index for the new docs. > Unfortunately the script doesn''t work, as t2 and t3 don''t find the docs > that t1 has added. > > Can anyone point out where I am going wrong. Thanks so much.Please let me know if the first paragraph of my previous email doesn''t explain this. Cheers, Dave
Neville Burnell
2006-Sep-06 07:07 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
Thanks Dave, I think I understand now ... FWIW, the following script works now I have read your responses. I''ve posted it here for others to read. ================= require ''rubygems'' require ''ferret'' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @writer = Ferret::Index::IndexWriter.new(:dir => @dir) @searcher = Ferret::Search::Searcher.new(@dir) @parser = Ferret::QueryParser.new @docs = [] @docs << {:id => 1, :name => ''Fred'', :occupation => ''Toon''} @docs << {:id => 2, :name => ''Barney'', :occupation => ''Toon''} @docs << {:id => 3, :name => ''Wilma'', :occupation => ''Toon''} @docs << {:id => 4, :name => ''Betty'', :occupation => ''Toon''} @docs << {:id => 5, :name => ''Pebbles'', :occupation => ''Toon''} @docs << {:id => 6, :name => ''Superman'', :occupation => ''Hero''} @docs << {:id => 7, :name => ''Batman'', :occupation => ''Hero''} @docs << {:id => 8, :name => ''Spiderman'', :occupation => ''Hero''} @docs << {:id => 9, :name => ''Green Lantern'', :occupation => ''Hero''} @docs << {:id => 10, :name => ''Dr Strange'', :occupation => ''Hero''} @docs << {:id => 11, :name => ''Phantom'', :occupation => ''Hero''} #populate index over time t1 = Thread.new do @docs.each do |doc| p "t1: adding #{doc[:id]} to index" @writer << doc sleep(10) end end #search for heroes over time t2 = Thread.new do query_txt = ''occupation:hero'' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t2: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 6 sleep(5) end end #search for toons over time t3 = Thread.new do query_txt = ''occupation:toon'' query = @parser.parse(query_txt) while true do hits = @searcher.search(query) p "t3: searching for #{query_txt} found #{hits.total_hits}" return if hits.total_hits == 5 sleep(5) end end t1.join; t2.join; t3.join -----Original Message----- From: ferret-talk-bounces at rubyforge.org [mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain Sent: Wednesday, 6 September 2006 4:43 PM To: ferret-talk at rubyforge.org Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario On 9/6/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> I''ve whipped up this script to demonstrate what I''m trying [and > failing] to achieve. The idea is that thread t1 adds docs to the index> over time, while threads t2 and t3 search the same index for the newdocs.> Unfortunately the script doesn''t work, as t2 and t3 don''t find the > docs that t1 has added. > > Can anyone point out where I am going wrong. Thanks so much.Please let me know if the first paragraph of my previous email doesn''t explain this. Cheers, Dave _______________________________________________ Ferret-talk mailing list Ferret-talk at rubyforge.org http://rubyforge.org/mailman/listinfo/ferret-talk
Neville Burnell
2006-Sep-06 07:16 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
Oops ... My cut & paste buffer was old! The key difference between this script and the old script is that the writer thread, t1, replaces the searcher after each index update, and each reader thread, t2 and t3, grab a new copy of the searcher, which they use for the duration of a search. So the old searchers are GC''d when no longer required. ================== require ''rubygems'' require ''ferret'' p Ferret::VERSION @dir = Ferret::Store::RAMDirectory.new @writer = Ferret::Index::IndexWriter.new(:dir => @dir) @searcher = Ferret::Search::Searcher.new(@dir) @parser = Ferret::QueryParser.new @docs = [] @docs << {:id => 1, :name => ''Fred'', :occupation => ''Toon''} @docs << {:id => 2, :name => ''Barney'', :occupation => ''Toon''} @docs << {:id => 3, :name => ''Wilma'', :occupation => ''Toon''} @docs << {:id => 4, :name => ''Betty'', :occupation => ''Toon''} @docs << {:id => 5, :name => ''Pebbles'', :occupation => ''Toon''} @docs << {:id => 6, :name => ''Superman'', :occupation => ''Hero''} @docs << {:id => 7, :name => ''Batman'', :occupation => ''Hero''} @docs << {:id => 8, :name => ''Spiderman'', :occupation => ''Hero''} @docs << {:id => 9, :name => ''Green Lantern'', :occupation => ''Hero''} @docs << {:id => 10, :name => ''Dr Strange'', :occupation => ''Hero''} @docs << {:id => 11, :name => ''Phantom'', :occupation => ''Hero''} #@docs.each {|doc| @writer << doc} #@writer.commit #@searcher = Ferret::Search::Searcher.new(@dir) #populate index over time t1 = Thread.new do @docs.each do |doc| p "t1: adding #{doc[:id]} to index" @writer << doc @writer.commit #new searcher @searcher = Ferret::Search::Searcher.new(@dir) sleep(10) end end #search for heroes over time t2 = Thread.new do query_txt = ''occupation:hero'' query = @parser.parse(query_txt) while true do mysearcher = @searcher hits = mysearcher.search(query) p "t2: searching for #{query_txt} found #{hits.total_hits}" break if hits.total_hits == 6 sleep(5) end end #search for toons over time t3 = Thread.new do query_txt = ''occupation:toon'' query = @parser.parse(query_txt) while true do mysearcher = @searcher hits = mysearcher.search(query) p "t3: searching for #{query_txt} found #{hits.total_hits}" break if hits.total_hits == 5 sleep(5) end end t1.join; t2.join; t3.join
Neville Burnell
2006-Sep-07 03:56 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
Thanks for your email Dave, I''ve thought about this overnight, and I''ve got a few questions please.> When you open an IndexReader on the index it is opened up on > that particular version (or state) of the indexWould you elaborate on how Ferret manages versions please. For example, can I have two readers open, one which accesses the old version of the index, and the second which accesses the latest version?> So to keep searches up to date you need to close and reopen > your IndexReader every time you commit changes to the index.I guess by reopen you mean IndexReader.new ? I proceeded to replace my Index usage with an IndexReader and Searcher which are closed and recreated after each IndexWriter pass, and the result seems to be that searches are still serialised - ie, a long running query on thread t1 "blocks" the normally very fast query on thread t1. Might I be seeing another point of synchonisation, or am I just observing a characteristic of ruby threads ? Kind Regards, Neville
David Balmain
2006-Sep-07 06:07 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
On 9/7/06, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> Thanks for your email Dave, > > I''ve thought about this overnight, and I''ve got a few questions please. > > > When you open an IndexReader on the index it is opened up on > > that particular version (or state) of the index > > Would you elaborate on how Ferret manages versions please. For example, > can I have two readers open, one which accesses the old version of the > index, and the second which accesses the latest version?When you open an IndexReader it opens all the files that it needs to read the index and it keeps all of the file handles. Even after the index is updated and those files are deleted they are not actually freed by the operating system. If you then open an IndexReader on a later version it holds file handles to all the files needed for that version. So the answer is yes, you can have multiple IndexReaders open on an index at the same time, all reading different versions. Each version of the index has an internal version number and there is an IndexReader#latest? method to determine if the version of the index that you are reading is the current version.> > So to keep searches up to date you need to close and reopen > > your IndexReader every time you commit changes to the index. > > I guess by reopen you mean IndexReader.new ?That''s correct. Don''t forget to close the old IndexReader. That garbage collector will do this for you but IndexReaders hold a lot of resources so it''s best to close them as soon as you no longer need them.> I proceeded to replace my Index usage with an IndexReader and Searcher > which are closed and recreated after each IndexWriter pass, and the > result seems to be that searches are still serialised - ie, a long > running query on thread t1 "blocks" the normally very fast query on > thread t1. > > Might I be seeing another point of synchonisation, or am I just > observing a characteristic of ruby threads ?I think it''s probably a symptom of using ruby threads. I don''t think they can swap threads in the middle of a call to a C function. It''s unusual, however for a search to take long enough to be a problem though. What kind of search is it? If it''s a PrefixQuery, FuzzyQuery or WildCardQuery you''ll get much better performance on an optimized index. If you are making heavy use of any of these queries it is the one time I''d recommend always keeping the index in an optimized state. cheers, Dave
Neville Burnell
2006-Sep-10 23:40 UTC
[Ferret-talk] Help with Multiple Readers, 1 Writer scenario
> It''s unusual, however for a search to take long > enough to be a problem though. What kind of search > is it?Actually I''m misleading you. The searches are very fast, ie, 0.1 sec or faster on my 30,000 doc index. By "slow query" I really mean my "#search_each do" which fetches each doc from the index and appends it to an xml or html response. This is clearly not a Ferret issue I think. Thanks for all your help Dave, Regards Neville
Apparently Analagous Threads
- Ferret 0.10.2 - Index#search_each() and :num_docs
- Error with :create => true and existing index
- 0.10.2 release with win32 gem
- Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit
- Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit