thr3ads.net - Ferret talk - [Ferret-talk] Ferret progress update [Feb 2007]

If this information is useful, please help other people find it:
Share via:

David Balmain

2007-Feb-22 05:05 UTC

[Ferret-talk] Ferret progress update

Hi folks,

Just thought I better let you all know that I''m still working on the
next release of Ferret. I''ve been working the last 7 days doing
nothing but Ferret development. The last iteration generated a diff of
almost 5000 lines so there are some pretty major changes. Most people
won''t notice these changes however as the API remains unchanged. But
if you were having problems with FileNotFound errors or other types of
segmentation faults the next version should fix most of them.

I''m now going to go through the mailing list and the Trac bug reports
to fix any other small problems laying around before I release the
next version. Coming soon...

-- 
Dave Balmain
http://www.davebalmain.com/

Jens Kraemer

2007-Feb-22 09:17 UTC

head link

[Ferret-talk] Ferret progress update

On Thu, Feb 22, 2007 at 04:05:05PM +1100, David Balmain
wrote:> Hi folks,
> 
> Just thought I better let you all know that I''m still working on
the
> next release of Ferret. I''ve been working the last 7 days doing
> nothing but Ferret development. The last iteration generated a diff of
> almost 5000 lines so there are some pretty major changes. Most people
> won''t notice these changes however as the API remains unchanged.
But
> if you were having problems with FileNotFound errors or other types of
> segmentation faults the next version should fix most of them.
You rock :-)


cheers,
Jens

-- 
Jens Kr?mer
webit! Gesellschaft f?r neue Medien mbH
Schnorrstra?e 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
kraemer at webit.de | www.webit.de
 
Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

John Leach

2007-Feb-22 11:08 UTC

head link

[Ferret-talk] Ferret progress update

Thanks Dave!  Looking forward to it.

Can you tell us a bit more about what led to the segfault error cropping
up?  Have they been in the 0.10 branch all along? 0.9 too? Or did some
new work break something?

Maybe it will help others debug problems in future.

John.
--
http://johnleach.co.uk

On Thu, 2007-02-22 at 16:05 +1100, David Balmain wrote:> Hi folks,
> 
> Just thought I better let you all know that I''m still working on
the
> next release of Ferret. I''ve been working the last 7 days doing
> nothing but Ferret development. The last iteration generated a diff of
> almost 5000 lines so there are some pretty major changes. Most people
> won''t notice these changes however as the API remains unchanged.
But
> if you were having problems with FileNotFound errors or other types of
> segmentation faults the next version should fix most of them.
> 
> I''m now going to go through the mailing list and the Trac bug
reports
> to fix any other small problems laying around before I release the
> next version. Coming soon...
>

David Balmain

2007-Feb-23 06:11 UTC

head link

[Ferret-talk] Ferret progress update

On 2/22/07, John Leach <john at johnleach.co.uk>
wrote:> Thanks Dave!  Looking forward to it.
>
> Can you tell us a bit more about what led to the segfault error cropping
> up?  Have they been in the 0.10 branch all along? 0.9 too? Or did some
> new work break something?
>
> Maybe it will help others debug problems in future.
Well, the main problem I fixed was due to an error introduced in 0.10.
I wasn''t locking the commit log in all the places I should have. This
actually would have been very easy to fix if someone had supplied a
repeatable test case. In the end though I decided to lock-less
commits, a new feature that has recently been added to Lucene. The
main advantages of this are that you can open IndexReaders when an
IndexWriter is committing and you can open multiple IndexReaders at a
time without them interrupting each other. It also makes it much
easier to recover after a crash. If your system crashes in the middle
of a commit then Ferret will be able to open the previously committed
version of the index.

As for the segfaults, I think I finally found the problem today. To
improve the performance of Ferret''s bindings I was adding objects to
Ruby''s Array directly instead of using the rb_ary_push method. Some of
these arrays are quite large so using rb_ary_push was a lot of
overhead which I didn''t think was really necessary ... but I
didn''t
quite get it right. For example, I had;

    rterms = rb_ary_new2(term_cnt);
    rts = RARRAY(rterms)->ptr;
    RARRAY(rterms)->len = term_cnt;
    for (i = 0; i < term_cnt; i++) {
        rts[i] = frt_get_tv_term(&terms[i]);
    }

So, in this example, the number of terms in a field can be very large
and we save a lot of time[1] by setting the C array directly rather
than use rb_ary_push. The problem occurs when the garbage collector
gets called in the middle of filling the array. It will try and mark
all of the objects contained by the array but the array isn''t filled
yet so many of its elements haven''t been set yet. What I should have
done was incremented the array length as I went.

    rterms = rb_ary_new2(term_cnt);
    rts = RARRAY(rterms)->ptr;
    for (i = 0; i < term_cnt; i++) {
        rts[i] = frt_get_tv_term(&terms[i]);
        RARRAY(rterms)->len++;
    }

This is touch slower than the original code but it now works so that''s
all that matters. You may be thinking I could have just set the length
after the loop.

    rterms = rb_ary_new2(term_cnt);
    rts = RARRAY(rterms)->ptr;
    for (i = 0; i < term_cnt; i++) {
        rts[i] = frt_get_tv_term(&terms[i]);
    }
    RARRAY(rterms)->len = term_cnt;

But the problem here is that the elements that have been added to the
array won''t actually get marked by the garbage collector because the
array''s length is still 0 so the could incorrectly be collected, thus
also causing a segfault. One alternate method that will work would be
to user rb_mem_clear():

    rterms = rb_ary_new2(term_cnt);
    rb_mem_clear(rterms, term_cnt);  // initialize all elements to nil
    rts = RARRAY(rterms)->ptr;
    RARRAY(rterms)->len = term_cnt;
    for (i = 0; i < term_cnt; i++) {
        rts[i] = frt_get_tv_term(&terms[i]);
    }

This makes sure all elements are set to nil before the are set to the
term vector so they are therefor safe from the garbage collector.

Anyway, sorry for the long and boring post. I guess the point is to
think about how the garbage collector works when developing ruby
bindings.

Cheers,
Dave

[1] How much faster? About 20% faster according to a simple benchmark
I just ran. Was it worth the segfaults? Of course not but in a library
like this you take the optimizations where you can get them.

-- 
Dave Balmain
http://www.davebalmain.com/

John Leach

2007-Feb-23 11:28 UTC

head link

[Ferret-talk] Ferret progress update

Hi Dave,

interesting stuff.  Apparently you can tell the GC not to mess with your
stuff using rb_gc_register_address (and rb_gc_unregister_address when/if
you''re done).

Looking at gc.c, all it does is add the pointer to the GC''s list of
things that are being used, so it won''t free it.

an example from the Ruby source (showing it being used before object
creation): ext/iconv/iconv.c

rb_gc_register_address(&charset_map);
charset_map = rb_hash_new();
rb_define_singleton_method(rb_cIconv, "charset_map", charset_map_get,
0);

I guess you can register before filling the array, set the length, then
unregister.  Not sure if this actually locks all the values in the array
though :/  If not, perhaps you could overwrite the mark function for the
array and restore it afterwards, heh.  Perhaps not worth the fiddling.

I''m no Ruby extension expert though, so beware :)

John.
--
http://johnleach.co.uk

On Fri, 2007-02-23 at 17:11 +1100, David Balmain wrote:> On 2/22/07, John Leach <john at johnleach.co.uk> wrote:
> > Thanks Dave!  Looking forward to it.
> >
> > Can you tell us a bit more about what led to the segfault error
cropping
> > up?  Have they been in the 0.10 branch all along? 0.9 too? Or did some
> > new work break something?
> >
> > Maybe it will help others debug problems in future.
> 
> Well, the main problem I fixed was due to an error introduced in 0.10.
> I wasn''t locking the commit log in all the places I should have.
This
> actually would have been very easy to fix if someone had supplied a
> repeatable test case. In the end though I decided to lock-less
> commits, a new feature that has recently been added to Lucene. The
> main advantages of this are that you can open IndexReaders when an
> IndexWriter is committing and you can open multiple IndexReaders at a
> time without them interrupting each other. It also makes it much
> easier to recover after a crash. If your system crashes in the middle
> of a commit then Ferret will be able to open the previously committed
> version of the index.<snip>

Maybe Matching Threads

Search for more possibly parallel threads

Ferret talk - Feb 2007 - Ferret progress update

[Ferret-talk] Ferret progress update

[Ferret-talk] Ferret progress update

[Ferret-talk] Ferret progress update

[Ferret-talk] Ferret progress update

[Ferret-talk] Ferret progress update

Maybe Matching Threads