I run a script that imports a few thousand records into the database. The script runs once for each of several XML files. What it does is parse the XML and for each element of a certain type creates a record in a rails database that gets indexed with acts_as_ferret. This worked fine before but today after a few files (70000 records) this started to happen for any file I tried: ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `<<'': : Error occured at <index_rw.c>:703 (StandardError) Error: exception 6 not handled: Could not obtain write lock when trying to write index from ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `ferret_create'' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:333:in `callback'' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:330:in `callback'' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:262:in `create_without_timestamps'' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/timestamp.rb:30:in `create'' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1718:in `create_or_update_without_callbacks'' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:249:in `create_or_update'' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1392:in `save_without_validation'' ... 24 levels... from /usr/lib/ruby/1.8/rexml/document.rb:173:in `parse_stream'' from /home/pedrocr/xmlcodec/lib/stream_parser.rb:74:in `parse'' from ./imports/import-ead:118 from ./imports/import-ead:115 The index is now in this state. There is no process running that could be holding a lock. Seems like some kind of race condition. I can provide the index (12MB) if it would help debug this. The only thing that I can see that might have changed behaviour is that I added some untokenized fields to sort by. Any ideas what it might be? Pedro.
On 8/1/06, Pedro C?rte-Real <Pedro.CorteReal at iantt.pt> wrote:> I run a script that imports a few thousand records into the database. > The script runs once for each of several XML files. What it does is > parse the XML and for each element of a certain type creates a record in > a rails database that gets indexed with acts_as_ferret. This worked fine > before but today after a few files (70000 records) this started to > happen for any file I tried: > > ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `<<'': : Error occured at <index_rw.c>:703 (StandardError) > Error: exception 6 not handled: Could not obtain write lock when trying > to write index > > from ./imports/../config/../vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in `ferret_create'' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:333:in `callback'' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:330:in `callback'' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:262:in `create_without_timestamps'' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/timestamp.rb:30:in `create'' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1718:in `create_or_update_without_callbacks'' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:249:in `create_or_update'' > > from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1392:in `save_without_validation'' > ... 24 levels... > from /usr/lib/ruby/1.8/rexml/document.rb:173:in `parse_stream'' > from /home/pedrocr/xmlcodec/lib/stream_parser.rb:74:in `parse'' > from ./imports/import-ead:118 > from ./imports/import-ead:115 > > > The index is now in this state. There is no process running that could > be holding a lock. Seems like some kind of race condition. I can provide > the index (12MB) if it would help debug this. > > The only thing that I can see that might have changed behaviour is that > I added some untokenized fields to sort by. > > Any ideas what it might be?This error occurs when there is a lock file open in the index. The first thing to do is make sure you don''t have any Index, IndexWriter, IndexReader or Searcher open on the index. These are the classes that lock the index. If this is already the case you have a stray lock file which you can just delete. Look in your index for .lck files. They can be deleted. Also note that if this probably means that some of the data that appeared to be added correctly to the index may not have been. The index should still be valid otherwise. Dave
On Tue, 2006-08-01 at 21:08 +0900, David Balmain wrote:> > The only thing that I can see that might have changed behaviour is that > > I added some untokenized fields to sort by. > > > > Any ideas what it might be? > > This error occurs when there is a lock file open in the index. The > first thing to do is make sure you don''t have any Index, IndexWriter, > IndexReader or Searcher open on the index. These are the classes that > lock the index.The only thing accessing the index is acts_as_ferret and that should release it when ruby finishes, right? I''m not running it on a webserver, it''s running in a script that does the same as the rails script/console. Could there be some race condition that makes it sometimes not finish?> If this is already the case you have a stray lock file > which you can just delete. Look in your index for .lck files. They can > be deleted. Also note that if this probably means that some of the > data that appeared to be added correctly to the index may not have > been. The index should still be valid otherwise.Have to try to understand what''s happening then. It seems to be constantly reproducible. Pedro.
On 8/1/06, Pedro C?rte-Real <Pedro.CorteReal at iantt.pt> wrote:> On Tue, 2006-08-01 at 21:08 +0900, David Balmain wrote: > > > The only thing that I can see that might have changed behaviour is that > > > I added some untokenized fields to sort by. > > > > > > Any ideas what it might be? > > > > This error occurs when there is a lock file open in the index. The > > first thing to do is make sure you don''t have any Index, IndexWriter, > > IndexReader or Searcher open on the index. These are the classes that > > lock the index. > > The only thing accessing the index is acts_as_ferret and that should > release it when ruby finishes, right? I''m not running it on a webserver, > it''s running in a script that does the same as the rails script/console. > Could there be some race condition that makes it sometimes not finish?No, I don''t think so. The only thing I can think of is it could be a reference counting error in the Ruby-Ferret bindings.> > If this is already the case you have a stray lock file > > which you can just delete. Look in your index for .lck files. They can > > be deleted. Also note that if this probably means that some of the > > data that appeared to be added correctly to the index may not have > > been. The index should still be valid otherwise. > > Have to try to understand what''s happening then. It seems to be > constantly reproducible.If you can distill the error to a smaller unit test or something that I can can reproduce here that would be great. It shouldn''t take me long to find the problem. Cheers, Dave
On Tue, 2006-08-01 at 22:22 +0900, David Balmain wrote:> > Have to try to understand what''s happening then. It seems to be > > constantly reproducible. > > If you can distill the error to a smaller unit test or something that > I can can reproduce here that would be great. It shouldn''t take me > long to find the problem.I''m trying to do that. The last time glibc threw a double free corruption warning. I''m trying to reproduce it more exactly. Pedro.
On Tue, 2006-08-01 at 15:02 +0100, Pedro C?rte-Real wrote:> I''m trying to do that. The last time glibc threw a double free > corruption warning. I''m trying to reproduce it more exactly.Still haven''t been able to do it completely but here''s the exact error: *** glibc detected *** double free or corruption (!prev): 0x0847e5b8 *** Don''t know if it''s possible to go from that hex value to the code line. Probably not. Pedro.
On Tue, 2006-08-01 at 17:06 +0100, Pedro C?rte-Real wrote:> On Tue, 2006-08-01 at 15:02 +0100, Pedro C?rte-Real wrote: > > I''m trying to do that. The last time glibc threw a double free > > corruption warning. I''m trying to reproduce it more exactly. > > Still haven''t been able to do it completely but here''s the exact error: > > *** glibc detected *** double free or corruption (!prev): 0x0847e5b8 *** > > Don''t know if it''s possible to go from that hex value to the code line. > Probably not.Here''s a backtrace: #0 0xffffe410 in __kernel_vsyscall () #1 0xb7cf19a1 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7cf32b9 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0xb7d2587a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 #4 0xb7d2bfd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7d2c34a in free () from /lib/tls/i686/cmov/libc.so.6 #6 0xb73e8cae in tb_destroy (tb=0x0) at term.c:91 #7 0xb73e8ffc in te_destroy (te=0x49) at term.c:214 #8 0xb73e9345 in ste_close (te=0x87444c0) at term.c:326 #9 0xb73e1b2b in smi_destroy (smi=0x8a7b5f8) at index_rw.c:1984 #10 0xb73e3661 in sm_merge_term_infos (sm=0x8a66588) at index_rw.c:2282 #11 0xb73e3827 in sm_merge_terms (sm=0x8a66588) at index_rw.c:2314 #12 0xb73e3fc2 in sm_merge (sm=0x8a66588) at index_rw.c:2384 #13 0xb73e406c in iw_merge_segments_with_max (iw=0x8396400, min_segment=18, max_segment=20) at index_rw.c:829 #14 0xb73e42a7 in iw_merge_segments (iw=0x6, min_segment=0) at index_rw.c:860 #15 0xb73e4311 in iw_flush_ram_segments (iw=0x8396400) at index_rw.c:918 #16 0xb73e4376 in iw_close (iw=0x8396400) at index_rw.c:965 #17 0xb73f6c2b in index_add_doc (self=0x8ad5278, doc=0x86e63d8) at ind.c:218 #18 0xb73d4a09 in frt_ind_add_doc (argc=1, argv=0x0, self=3075201096) at r_search.c:1900 #19 0xb7e7e5d3 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 #20 0xb7e89089 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #21 0xb7e89aef in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 #22 0xb7e8675b in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 I seem to be able to consistently generate this. What else should I do? Pedro.
On 8/2/06, Pedro C?rte-Real <Pedro.CorteReal at iantt.pt> wrote:> On Tue, 2006-08-01 at 17:06 +0100, Pedro C?rte-Real wrote: > > On Tue, 2006-08-01 at 15:02 +0100, Pedro C?rte-Real wrote: > > > I''m trying to do that. The last time glibc threw a double free > > > corruption warning. I''m trying to reproduce it more exactly. > > > > Still haven''t been able to do it completely but here''s the exact error: > > > > *** glibc detected *** double free or corruption (!prev): 0x0847e5b8 *** > > > > Don''t know if it''s possible to go from that hex value to the code line. > > Probably not. > > Here''s a backtrace: > > #0 0xffffe410 in __kernel_vsyscall () > #1 0xb7cf19a1 in raise () from /lib/tls/i686/cmov/libc.so.6 > #2 0xb7cf32b9 in abort () from /lib/tls/i686/cmov/libc.so.6 > #3 0xb7d2587a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 > #4 0xb7d2bfd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6 > #5 0xb7d2c34a in free () from /lib/tls/i686/cmov/libc.so.6 > #6 0xb73e8cae in tb_destroy (tb=0x0) at term.c:91 > #7 0xb73e8ffc in te_destroy (te=0x49) at term.c:214 > #8 0xb73e9345 in ste_close (te=0x87444c0) at term.c:326 > #9 0xb73e1b2b in smi_destroy (smi=0x8a7b5f8) at index_rw.c:1984 > #10 0xb73e3661 in sm_merge_term_infos (sm=0x8a66588) at index_rw.c:2282 > #11 0xb73e3827 in sm_merge_terms (sm=0x8a66588) at index_rw.c:2314 > #12 0xb73e3fc2 in sm_merge (sm=0x8a66588) at index_rw.c:2384 > #13 0xb73e406c in iw_merge_segments_with_max (iw=0x8396400, min_segment=18, > max_segment=20) at index_rw.c:829 > #14 0xb73e42a7 in iw_merge_segments (iw=0x6, min_segment=0) at index_rw.c:860 > #15 0xb73e4311 in iw_flush_ram_segments (iw=0x8396400) at index_rw.c:918 > #16 0xb73e4376 in iw_close (iw=0x8396400) at index_rw.c:965 > #17 0xb73f6c2b in index_add_doc (self=0x8ad5278, doc=0x86e63d8) at ind.c:218 > #18 0xb73d4a09 in frt_ind_add_doc (argc=1, argv=0x0, self=3075201096) > at r_search.c:1900 > #19 0xb7e7e5d3 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8 > #20 0xb7e89089 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 > #21 0xb7e89aef in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 > #22 0xb7e8675b in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8 > > I seem to be able to consistently generate this. What else should I do?This would usually be enough but there seems to be some weird stuff happening there. For example, argv = NULL on line #18 which doesn''t seem possible to me. On line #14 iw gets a strange value but then it gets restored on line #13. Anyway, I think the error is occurring higher up on the stack so maybe it doesn''t matter. I''ll keep trying to work out what is going on but it''d be a lot easier if you could give me something I could run here to reproduce the error. Cheers, Dave
On Wed, 2006-08-02 at 08:41 +0900, David Balmain wrote:> This would usually be enough but there seems to be some weird stuff > happening there. For example, argv = NULL on line #18 which doesn''t > seem possible to me. On line #14 iw gets a strange value but then it > gets restored on line #13. Anyway, I think the error is occurring > higher up on the stack so maybe it doesn''t matter. I''ll keep trying to > work out what is going on but it''d be a lot easier if you could give > me something I could run here to reproduce the error.I have a standalone test case that blows up predictably every time. Since it involves some internal data I have to clear it with the powers that be. Shouldn''t be a problem. As soon as that''s done I''ll mail it privately to you. Shouldn''t take too long. Meanwhile here''s a different backtrace: #0 0xffffe410 in __kernel_vsyscall () #1 0xb7d469a1 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7d482b9 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0xb7d7a87a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6 #4 0xb7d80fd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7d8134a in free () from /lib/tls/i686/cmov/libc.so.6 #6 0xb7b8dcae in tb_destroy (tb=0x0) at term.c:91 #7 0xb7b8dffc in te_destroy (te=0x49) at term.c:214 #8 0xb7b8e345 in ste_close (te=0x821b978) at term.c:326 #9 0xb7b86b2b in smi_destroy (smi=0x823e670) at index_rw.c:1984 #10 0xb7b88661 in sm_merge_term_infos (sm=0x80d2ba0) at index_rw.c:2282 #11 0xb7b88827 in sm_merge_terms (sm=0x80d2ba0) at index_rw.c:2314 #12 0xb7b88fc2 in sm_merge (sm=0x80d2ba0) at index_rw.c:2384 #13 0xb7b8906c in iw_merge_segments_with_max (iw=0x80a8450, min_segment=12, max_segment=17) at index_rw.c:829 #14 0xb7b892a7 in iw_merge_segments (iw=0x6, min_segment=0) at index_rw.c:860 #15 0xb7b89311 in iw_flush_ram_segments (iw=0x80a8450) at index_rw.c:918 #16 0xb7b89376 in iw_close (iw=0x80a8450) at index_rw.c:965 #17 0xb7b9ab93 in index_destroy (self=0x80fd6c0) at ind.c:90 #18 0xb7b7679b in frt_ind_free (p=0x80fd6c0) at r_search.c:1677 #19 0xb7ef08de in rb_gc_call_finalizer_at_exit () from /usr/lib/libruby1.8.so.1.8 #20 0xb7ed28be in is_ruby_native_thread () from /usr/lib/libruby1.8.so.1.8 #21 0xb7eea0d3 in ruby_cleanup () from /usr/lib/libruby1.8.so.1.8 #22 0xb7eea1b3 in ruby_stop () from /usr/lib/libruby1.8.so.1.8 #23 0xb7eea84e in ruby_run () from /usr/lib/libruby1.8.so.1.8 #24 0x080485dc in main () Looks much cleaner since it''s actually a full one while the other was full of ruby''s thread handling that I truncated. Probably because it was running inside a full rails environment. This time I ripped everything out and it''s just ferret calls. Pedro.