Sergei Serdyuk
2006-Jun-20 15:34 UTC
[Ferret-talk] Any fast way to update non-indexed fields?
Hi,>From looking at Ruby sources it seems that every update method deletesand reinserts documents. It makes sense if indexed fields are changed but what if it is not the case? It would speed up update a lot indexes did not have to be updated twice for nothing. Any quick way to do it? -- Sergei Serdyuk Red Leaf Software LLC web: http://redleafsoft.com -- Posted via http://www.ruby-forum.com/.
Sergei Serdyuk wrote:> Hi, > > From looking at Ruby sources it seems that every update method deletes > and reinserts documents. It makes sense if indexed fields are changed > but what if it is not the case? It would speed up update a lot indexes > did not have to be updated twice for nothing. Any quick way to do it?I''m not an expert with Lucene, but I believe that''s how Lucene indexes work - there is no update, only create and delete. -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Jun-21 01:30 UTC
[Ferret-talk] Any fast way to update non-indexed fields?
On 6/21/06, ryan king <ryan at theryanking.com> wrote:> Sergei Serdyuk wrote: > > Hi, > > > > From looking at Ruby sources it seems that every update method deletes > > and reinserts documents. It makes sense if indexed fields are changed > > but what if it is not the case? It would speed up update a lot indexes > > did not have to be updated twice for nothing. Any quick way to do it? > > I''m not an expert with Lucene, but I believe that''s how Lucene indexes > work - there is no update, only create and delete.It is in fact the way Lucene works. The main problem with the update method in Ferret is that for each update it needs to open an IndexReader to read and delete the old doc, then close it and open and IndexWriter to open a new doc. In the version of Ferret I''m working on now you''ll be able to do updates directly on the IndexWriter so it should be a lot faster. As for just updating the stored-unindexed fields, I''ll have to think about it. It''ll add a bit of complexity to the merge process which I''m not to keen on. But it is certainly possible. Sergei, what type of field is it that you need to update? And to everyone else on the list, is this a common action? That is, do you often need to update non-indexed fields? Cheers, Dave
Sergei Serdyuk
2006-Jun-22 14:35 UTC
[Ferret-talk] Any fast way to update non-indexed fields?
They are stored non-indexed fields. In my case I wanted to have some stock data in searchable index. This is not top priority, as I can really have a second index or a database and do lookups by :id. If I were to wish for something in coming Ferret, I''d wish "stability". I am getting seg_faults every other time I am doing this: def self.internal_field_values(fieldname) term_enum = @@reader.terms_from(Ferret::Index::Term.new(fieldname, "")); out = [] while term_enum.term and (term_enum.term.field == fieldname) # seg faults here out << term_enum.term.text break unless term_enum.next? end out end> As for just updating the stored-unindexed fields, I''ll have to think > about it. It''ll add a bit of complexity to the merge process which I''m > not to keen on. But it is certainly possible. Sergei, what type of > field is it that you need to update? And to everyone else on the list, > is this a common action? That is, do you often need to update > non-indexed fields? > > Cheers, > Dave-- Posted via http://www.ruby-forum.com/.
Sergei Serdyuk
2006-Jun-22 14:45 UTC
[Ferret-talk] Any fast way to update non-indexed fields?
Reported as a bug: http://ferret.davebalmain.com/trac/ticket/69> If I were to wish for something in coming Ferret, I''d wish "stability". > I am getting seg_faults every other time I am doing this:-- Posted via http://www.ruby-forum.com/.
Marvin Humphrey
2006-Jun-22 23:27 UTC
[Ferret-talk] Any fast way to update non-indexed fields?
[resending... for some reason, this didn''t go through this morning...] On Jun 22, 2006, at 7:45 AM, Sergei Serdyuk wrote:>> If I were to wish for something in coming Ferret, I''d wish >> "stability". >> I am getting seg_faults every other time I am doing this:Dave, I see you''ve done some work with Valgrind, but I''m not sure how much. To catch errors and memory leaks with KinoSearch, I wrote up a simple script that runs the whole test suite under Valgrind. The test suite takes around 15 minutes to run that way instead of 9 seconds (on the one box where I have Valgrind available), so I only run it rarely -- always when preparing a release, and sometimes when debugging new or refactored C code. Some of the code in KinoSearch''s test suite doesn''t even produce output; it''s just there to exercise an area where there might be memory problems. Do you have something like that going on with Ferret? It''s been extremely helpful for me. I don''t think I''ve seen a single segfault bug report since KinoSearch was released, though I have missed a couple memory leaks because the Valgrind output can be a little hard to interpret (there are a few harmless items in Perl that look like memory leaks to Valgrind, which makes real leaks harder to spot). Marvin Humphrey Rectangular Research http://www.rectangular.com/
Marvin Humphrey
2006-Jun-26 17:02 UTC
[Ferret-talk] Any fast way to update non-indexed fields?
[resending... for some reason, this didn''t go through earlier...] On Jun 22, 2006, at 7:45 AM, Sergei Serdyuk wrote:>> If I were to wish for something in coming Ferret, I''d wish >> "stability". >> I am getting seg_faults every other time I am doing this:Dave, I see you''ve done some work with Valgrind, but I''m not sure how much. To catch errors and memory leaks with KinoSearch, I wrote up a simple script that runs the whole test suite under Valgrind. The test suite takes around 15 minutes to run that way instead of 9 seconds (on the one box where I have Valgrind available), so I only run it rarely -- always when preparing a release, and sometimes when debugging new or refactored C code. Some of the code in KinoSearch''s test suite doesn''t even produce output; it''s just there to exercise an area where there might be memory problems. Do you have something like that going on with Ferret? It''s been extremely helpful for me. I don''t think I''ve seen a single segfault bug report since KinoSearch was released, though I have missed a couple memory leaks because the Valgrind output can be a little hard to interpret (there are a few harmless items in Perl that look like memory leaks to Valgrind, which makes real leaks harder to spot). Marvin Humphrey Rectangular Research http://www.rectangular.com/
David Balmain
2006-Jun-28 00:05 UTC
[Ferret-talk] Any fast way to update non-indexed fields?
On 6/23/06, Marvin Humphrey <marvin at rectangular.com> wrote:> [resending... for some reason, this didn''t go through this morning...] > > On Jun 22, 2006, at 7:45 AM, Sergei Serdyuk wrote: > > >> If I were to wish for something in coming Ferret, I''d wish > >> "stability". > >> I am getting seg_faults every other time I am doing this: > > Dave, I see you''ve done some work with Valgrind, but I''m not sure how > much. To catch errors and memory leaks with KinoSearch, I wrote up a > simple script that runs the whole test suite under Valgrind. The > test suite takes around 15 minutes to run that way instead of 9 > seconds (on the one box where I have Valgrind available), so I only > run it rarely -- always when preparing a release, and sometimes when > debugging new or refactored C code. Some of the code in KinoSearch''s > test suite doesn''t even produce output; it''s just there to exercise > an area where there might be memory problems. > > Do you have something like that going on with Ferret? It''s been > extremely helpful for me. I don''t think I''ve seen a single segfault > bug report since KinoSearch was released, though I have missed a > couple memory leaks because the Valgrind output can be a little hard > to interpret (there are a few harmless items in Perl that look like > memory leaks to Valgrind, which makes real leaks harder to spot). >Hi Marvin, I do use Valgrind. In fact the reason I have been so quiet on the list lately is I''ve been working really hard on cleaning up the code in Ferret so that I can realease a more stable version. The tool I need to make more use of is gcov. The problem is that some areas of the code just aren''t getting exercised enough. Cheers, Dave