aslak hellesoy
2005-Nov-17 02:57 UTC
[Ferret-talk] lock problems from concurrent processes.
Hi! First, thanks a LOT for ferret. The API and documentation is great. I''m trying to integrate ferret into a RoR app (DamageControl) and have run into a problem with locks. DamageControl consists of two processes that start up and run in parallel. The first one is the webapp (which is just a plain RoR app). The second is a daemon process that runs in the background. The daemon process writes to the index, and the webapp reads from it. It''s the same index, stored in the same directory. My problem is that the webapp gets lock errors: /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in `obtain'': could not obtain lock: /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock (RuntimeError) from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in `initialize'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in `new'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in `initialize'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in `synchronize'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in `initialize'' from ./lib/damagecontrol/ferret_config.rb:6:in `new'' from ./lib/damagecontrol/ferret_config.rb:6:in `get_index'' from ./lib/damagecontrol/build_daemon.rb:10 Is it possible to create a ''read-only'' index that doesn''t try to acquire a lock? Or is there a different way to achieve concurrent access to an index from different processes where one of them is only writing and the other is only reading? Cheers, Aslak
David Balmain
2005-Nov-17 13:03 UTC
[Ferret-talk] lock problems from concurrent processes.
Hi Aslak, Great to hear you are integrating Ferret into DamageControl. I''ll try to be as much help as possible. Ferret is designed to work with multiple processes accessing the index (hence the locks) so this problem shouldn''t be too hard to solve. You have two options. The first might be a little easier but performance want be as good. That is to flush the index after you do a write so that it won''t hold the lock for an extended period of time. See Ferret::Index::Index#flush(). This is the best solution if multiple processes are reading and writing. The second option shouldn''t be too difficult either although I haven''t documented it very well yet. That is to use Index::IndexWriter for writing to the indexing and Search::IndexSearcher for searching the index. Actually, you could continue to use Index::Index for the process that is writing to the index and Index::Searcher for the read only process. Index::IndexSearcher will never open any locks. Probably the best place to look for examples of how to use Index::IndexWriter and Search::IndexSearcher is actually within the Index::Index class itself. Hope this helps. Cheers, Dave PS: One thing I should mention is that deletes actually happen through Index::IndexReader. This probably seems a little confusing. It did to me to start with anyway. Again, check out the code in Index::Index to see how it handles deletes. On 11/17/05, aslak hellesoy <aslak.hellesoy at gmail.com> wrote:> Hi! > > First, thanks a LOT for ferret. The API and documentation is great. > > I''m trying to integrate ferret into a RoR app (DamageControl) and have > run into a problem with locks. > DamageControl consists of two processes that start up and run in > parallel. The first one is the webapp (which > is just a plain RoR app). The second is a daemon process that runs in > the background. > > The daemon process writes to the index, and the webapp reads from it. > It''s the same index, stored in the same directory. > > My problem is that the webapp gets lock errors: > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > `obtain'': could not obtain lock: > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > (RuntimeError) > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > `initialize'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > `new'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > `initialize'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > `synchronize'' > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > `initialize'' > from ./lib/damagecontrol/ferret_config.rb:6:in `new'' > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index'' > from ./lib/damagecontrol/build_daemon.rb:10 > > Is it possible to create a ''read-only'' index that doesn''t try to > acquire a lock? Or is there a different way to achieve concurrent > access to an index from different processes where one of them is only > writing and the other is only reading? > > Cheers, > Aslak > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
aslak hellesoy
2005-Nov-17 14:35 UTC
[Ferret-talk] lock problems from concurrent processes.
On 11/17/05, David Balmain <dbalmain.ml at gmail.com> wrote:> Hi Aslak, > > Great to hear you are integrating Ferret into DamageControl. I''ll try > to be as much help as possible. Ferret is designed to work with > multiple processes accessing the index (hence the locks) so this > problem shouldn''t be too hard to solve. You have two options. > > The first might be a little easier but performance want be as good. > That is to flush the index after you do a write so that it won''t hold > the lock for an extended period of time. See > Ferret::Index::Index#flush(). This is the best solution if multiple > processes are reading and writing. > > The second option shouldn''t be too difficult either although I haven''t > documented it very well yet. That is to use Index::IndexWriter for > writing to the indexing and Search::IndexSearcher for searching the > index. Actually, you could continue to use Index::Index for the > process that is writing to the index and Index::Searcher for the read > only process. Index::IndexSearcher will never open any locks. Probably > the best place to look for examples of how to use Index::IndexWriter > and Search::IndexSearcher is actually within the Index::Index class > itself. >Great - this makes a lot of sense.> Hope this helps. > Cheers, > Dave > > PS: One thing I should mention is that deletes actually happen through > Index::IndexReader. This probably seems a little confusing. It did to > me to start with anyway.Something you wrote was confusing for you? Would it be possible to make it a bit more intuitive? Ferret is one of the best-written Ruby frameworks I have seen so far (both in implementation and API design), so you might as well shoot for complete excellence :-)> Again, check out the code in Index::Index to > see how it handles deletes. > > > On 11/17/05, aslak hellesoy <aslak.hellesoy at gmail.com> wrote: > > Hi! > > > > First, thanks a LOT for ferret. The API and documentation is great. > > > > I''m trying to integrate ferret into a RoR app (DamageControl) and have > > run into a problem with locks. > > DamageControl consists of two processes that start up and run in > > parallel. The first one is the webapp (which > > is just a plain RoR app). The second is a daemon process that runs in > > the background. > > > > The daemon process writes to the index, and the webapp reads from it. > > It''s the same index, stored in the same directory. > > > > My problem is that the webapp gets lock errors: > > > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > > `obtain'': could not obtain lock: > > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > > (RuntimeError) > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > > `initialize'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `new'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `initialize'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `synchronize'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `initialize'' > > from ./lib/damagecontrol/ferret_config.rb:6:in `new'' > > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index'' > > from ./lib/damagecontrol/build_daemon.rb:10 > > > > Is it possible to create a ''read-only'' index that doesn''t try to > > acquire a lock? Or is there a different way to achieve concurrent > > access to an index from different processes where one of them is only > > writing and the other is only reading? > > > > Cheers, > > Aslak > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > >
On 17 Nov 2005, at 09:35, aslak hellesoy wrote:>> PS: One thing I should mention is that deletes actually happen >> through >> Index::IndexReader. This probably seems a little confusing. It did to >> me to start with anyway. >> > > Something you wrote was confusing for you? Would it be possible to > make it a bit more intuitive? > > Ferret is one of the best-written Ruby frameworks I have seen so far > (both in implementation and API design), so you might as well shoot > for complete excellence :-)Dave has indeed done an amazing job of porting Lucene! The confusing aspect here is that his port is faithful enough to pass through a confusing piece of the Java Lucene API. His Index::Index class is not really part of Java Lucene, so you''re getting an extra bonus there instead of dealing with IndexWriter, IndexReader, and IndexSearcher directly. IndexReader in Java Lucene is used for reading _and_ deleting documents - this is just the nature of the beast. Deleting a document in Lucene merely flags it as deleted and doesn''t actual remove anything - thus the IndexReader facility is used for this operation, not the IndexWriter which has a much more substantial role in indexing new documents. Erik
aslak hellesoy
2005-Nov-18 03:29 UTC
[Ferret-talk] lock problems from concurrent processes.
On 11/17/05, David Balmain <dbalmain.ml at gmail.com> wrote:> Hi Aslak, > > Great to hear you are integrating Ferret into DamageControl. I''ll try > to be as much help as possible. Ferret is designed to work with > multiple processes accessing the index (hence the locks) so this > problem shouldn''t be too hard to solve. You have two options. > > The first might be a little easier but performance want be as good. > That is to flush the index after you do a write so that it won''t hold > the lock for an extended period of time. See > Ferret::Index::Index#flush(). This is the best solution if multiple > processes are reading and writing. > > The second option shouldn''t be too difficult either although I haven''t > documented it very well yet. That is to use Index::IndexWriter for > writing to the indexing and Search::IndexSearcher for searching the > index. Actually, you could continue to use Index::Index for the > process that is writing to the index and Index::Searcher for the read > only process. Index::IndexSearcher will never open any locks. Probably > the best place to look for examples of how to use Index::IndexWriter > and Search::IndexSearcher is actually within the Index::Index class > itself. >Reader/Searcher sounds like the best option for me. I still have some questions though: Index::IndexSearcher.search_query doesn''t understand String queries like Index::Index does - I need a Search::Query object. Since I still want my API to be able to use FQL, I need a QueryParser. In order to understand how to use QueryParser I peeked at Index::Index'' use of QueryParser. I see: if @qp.nil? @qp = Ferret::QueryParser.new(@default_search_field, @options) end # we need to set this ever time, in case a new field has been added @qp.fields = @reader.get_field_names.to_a query = @qp.parse(query) So to my best judgement it looks like I need a Index::IndexReader in order to use QueryParser. This is where I run into problems. I''m using IndexReader.open, passing in a dir as a String. What happens then is that my existing index directory gets wiped out. It happens when open invokes Store::FSDirectory.new(directory, true). I''m now sufficiently deep into my rabbit hole that I''m not sure when I dug too deep :-) So my question is: How do I create and use a QueryParser without wiping out the existing index files? Cheers, Aslak> Hope this helps. > Cheers, > Dave > > PS: One thing I should mention is that deletes actually happen through > Index::IndexReader. This probably seems a little confusing. It did to > me to start with anyway. Again, check out the code in Index::Index to > see how it handles deletes. > > > On 11/17/05, aslak hellesoy <aslak.hellesoy at gmail.com> wrote: > > Hi! > > > > First, thanks a LOT for ferret. The API and documentation is great. > > > > I''m trying to integrate ferret into a RoR app (DamageControl) and have > > run into a problem with locks. > > DamageControl consists of two processes that start up and run in > > parallel. The first one is the webapp (which > > is just a plain RoR app). The second is a daemon process that runs in > > the background. > > > > The daemon process writes to the index, and the webapp reads from it. > > It''s the same index, stored in the same directory. > > > > My problem is that the webapp gets lock errors: > > > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > > `obtain'': could not obtain lock: > > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > > (RuntimeError) > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > > `initialize'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `new'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > `initialize'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `synchronize'' > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > `initialize'' > > from ./lib/damagecontrol/ferret_config.rb:6:in `new'' > > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index'' > > from ./lib/damagecontrol/build_daemon.rb:10 > > > > Is it possible to create a ''read-only'' index that doesn''t try to > > acquire a lock? Or is there a different way to achieve concurrent > > access to an index from different processes where one of them is only > > writing and the other is only reading? > > > > Cheers, > > Aslak > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > >
David Balmain
2005-Nov-18 03:51 UTC
[Ferret-talk] lock problems from concurrent processes.
Hi Aslak, Probably the easiest way to get the reader is streat from the Search::IndexSearcher object. reader is one of it''s attributes so there is no need to open a new one. That should solve your problem. ie; if @qp.nil? @qp = Ferret::QueryParser.new(@default_search_field, @options) end # we need to set this ever time, in case a new field has been added @qp.fields = @searcher.reader.get_field_names.to_a query = @qp.parse(query) However, you don''t really need the "@qp.fields" line unless you want to allow multi field queries with the ''*'' symbol. For example; index.search_each("*:Customer") do |doc, score| That searches for the word Customer in all fields in the document. But in your case you may have only one field in which case it isn''t necessary. Or, you are going to know what fields exist before hand so you can just feed them in when you create the query parser like this; if @qp.nil? options[:analyzer] = @analyzer options[:fields] = ["source", "comments"] @qp = Ferret::QueryParser.new(@default_search_field, options) end query = @qp.parse(query) Having said all this, I have to admit that you''ve actually found a bug so it''ll be fixed in the next version. IndexReader#open should invoke Store::FSDirectory.new(directory, false). Thanks. Cheers, Dave On 11/18/05, aslak hellesoy <aslak.hellesoy at gmail.com> wrote:> On 11/17/05, David Balmain <dbalmain.ml at gmail.com> wrote: > > Hi Aslak, > > > > Great to hear you are integrating Ferret into DamageControl. I''ll try > > to be as much help as possible. Ferret is designed to work with > > multiple processes accessing the index (hence the locks) so this > > problem shouldn''t be too hard to solve. You have two options. > > > > The first might be a little easier but performance want be as good. > > That is to flush the index after you do a write so that it won''t hold > > the lock for an extended period of time. See > > Ferret::Index::Index#flush(). This is the best solution if multiple > > processes are reading and writing. > > > > The second option shouldn''t be too difficult either although I haven''t > > documented it very well yet. That is to use Index::IndexWriter for > > writing to the indexing and Search::IndexSearcher for searching the > > index. Actually, you could continue to use Index::Index for the > > process that is writing to the index and Index::Searcher for the read > > only process. Index::IndexSearcher will never open any locks. Probably > > the best place to look for examples of how to use Index::IndexWriter > > and Search::IndexSearcher is actually within the Index::Index class > > itself. > > > > Reader/Searcher sounds like the best option for me. I still have some > questions though: > > Index::IndexSearcher.search_query doesn''t understand String queries > like Index::Index does - I need a Search::Query object. Since I still > want my API to be able to use FQL, I need a QueryParser. > > In order to understand how to use QueryParser I peeked at > Index::Index'' use of QueryParser. I see: > > if @qp.nil? > @qp = Ferret::QueryParser.new(@default_search_field, @options) > end > # we need to set this ever time, in case a new field has been added > @qp.fields = @reader.get_field_names.to_a > query = @qp.parse(query) > > So to my best judgement it looks like I need a Index::IndexReader in > order to use QueryParser. > This is where I run into problems. > > I''m using IndexReader.open, passing in a dir as a String. What happens > then is that my existing index directory gets wiped out. It happens > when open invokes Store::FSDirectory.new(directory, true). > > I''m now sufficiently deep into my rabbit hole that I''m not sure when I > dug too deep :-) > > So my question is: How do I create and use a QueryParser without > wiping out the existing index files? > > Cheers, > Aslak > > > Hope this helps. > > Cheers, > > Dave > > > > PS: One thing I should mention is that deletes actually happen through > > Index::IndexReader. This probably seems a little confusing. It did to > > me to start with anyway. Again, check out the code in Index::Index to > > see how it handles deletes. > > > > > > On 11/17/05, aslak hellesoy <aslak.hellesoy at gmail.com> wrote: > > > Hi! > > > > > > First, thanks a LOT for ferret. The API and documentation is great. > > > > > > I''m trying to integrate ferret into a RoR app (DamageControl) and have > > > run into a problem with locks. > > > DamageControl consists of two processes that start up and run in > > > parallel. The first one is the webapp (which > > > is just a plain RoR app). The second is a daemon process that runs in > > > the background. > > > > > > The daemon process writes to the index, and the webapp reads from it. > > > It''s the same index, stored in the same directory. > > > > > > My problem is that the webapp gets lock errors: > > > > > > /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/store/fs_store.rb:226:in > > > `obtain'': could not obtain lock: > > > /Users/aslakhellesoy/scm/dc_svn/branches/damagecontrol_active_record/testdata/index/ferret-11f222dc32bbe019198a2b42644196f9write.lock > > > (RuntimeError) > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index_writer.rb:100:in > > > `initialize'' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > > `new'' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:111:in > > > `initialize'' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > > `synchronize'' > > > from /usr/lib/ruby/gems/1.8/gems/ferret-0.2.1/lib/ferret/index/index.rb:109:in > > > `initialize'' > > > from ./lib/damagecontrol/ferret_config.rb:6:in `new'' > > > from ./lib/damagecontrol/ferret_config.rb:6:in `get_index'' > > > from ./lib/damagecontrol/build_daemon.rb:10 > > > > > > Is it possible to create a ''read-only'' index that doesn''t try to > > > acquire a lock? Or is there a different way to achieve concurrent > > > access to an index from different processes where one of them is only > > > writing and the other is only reading? > > > > > > Cheers, > > > Aslak > > > > > > _______________________________________________ > > > Ferret-talk mailing list > > > Ferret-talk at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > > >
aslak hellesoy
2005-Nov-18 04:06 UTC
[Ferret-talk] lock problems from concurrent processes.
> Index::IndexSearcher.search_query doesn''t understand String queries > like Index::Index does - I need a Search::Query object. Since I still > want my API to be able to use FQL, I need a QueryParser. > > In order to understand how to use QueryParser I peeked at > Index::Index'' use of QueryParser. I see: > > if @qp.nil? > @qp = Ferret::QueryParser.new(@default_search_field, @options) > end > # we need to set this ever time, in case a new field has been added > @qp.fields = @reader.get_field_names.to_a > query = @qp.parse(query) > > So to my best judgement it looks like I need a Index::IndexReader in > order to use QueryParser. > This is where I run into problems. > > I''m using IndexReader.open, passing in a dir as a String. What happens > then is that my existing index directory gets wiped out. It happens > when open invokes Store::FSDirectory.new(directory, true). > > I''m now sufficiently deep into my rabbit hole that I''m not sure when I > dug too deep :-) > > So my question is: How do I create and use a QueryParser without > wiping out the existing index files? >Please disregard my previous question. I did it like this: module RevisionFileSearching # Searches for RevisionFile instances using the Ferret index def search_each(query) #:yield: revision_file# dir = Ferret::Store::FSDirectory.new("my_index_dir", false) @index_searcher ||= Ferret::Search::IndexSearcher.new(dir) @index_reader ||= Ferret::Index::IndexReader.open(dir, false) @query_parser ||= Ferret::QueryParser.new("data", {}) @query_parser.fields = @index_reader.get_field_names.to_a query = @query_parser.parse(query) @index_searcher.search_each(query) do |doc, score| id = @index_reader.get_document(doc)["id"] yield RevisionFile.find(id) end end end I''ll knock up a blog entry about this and join the Ferret propaganda machine :-) Cheers, Aslak