thr3ads.net - Ferret talk - [Ferret-talk] issues with index for table with over 18 million records [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Craig Jolicoeur

2007-Aug-08 20:16 UTC

[Ferret-talk] issues with index for table with over 18 million records

I have a MySQL table with over 18 million records in it.  We are
indexing about 10 fields in this table with ferret.

I am having problems with the initial building of the index.  I created
a rake task to run the "Model.rebuild_index" command in the
background.
That process ran fine for about 2.5 days before it just suddenly
stopped.  The log/ferret_index.log file says it got to about 28% before
ending.  I''m not sure if the process died because of something on my
server or because of something related to ferret.

It appears that it will take close to 10 days for the full index to be
build with rebuild_index?  Is this normal for a table of this size?
Also, is there a way to start where the index ended and update from
there instead of having to rebuild the entire index from scratch?  I got
about 28% of the way through so would like to not have to waste the 2.5
days to rebuild that part again trying to get the full index 100% built.

Also, is there a way that I can non-destructive rebuild the index since
it didnt complete 100%? Meaning, can I rebuild it without overwriting
what is already there?  That way I can keep what I have to be searched
while the rebuild takes place and then move that over the old index?
I''m not running ferret as a Drb server so I dont know if I can.

Also, is there a faster or better way that I can/should be building the
index?  Will I have an issue with the index file sizes with a DB this
size?
-- 
Posted via http://www.ruby-forum.com/.

Erik Morton

2007-Aug-08 20:50 UTC

head link

[Ferret-talk] issues with index for table with over 18 million records

We have a 1 million record index that is about 6GB in size. We build  
it in parallel w/out AAF so it''s hard to comment on the speed of your  
index build. However I will say that I did need to manually patch  
Ferret to better handle large indexes.

Here is the diff:

--- /usr/local/lib/ruby/gems/1.8/gems/ferret-0.11.4/ext/index.c
+++ index.c
@@ -1375,7 +1375,7 @@
      lazy_doc = lazy_doc_new(stored_cnt, fdt_in);
      for (i = 0; i < stored_cnt; i++) {
-        int start = 0, end, data_cnt;
+        off_t start = 0, end, data_cnt;
          field_num = is_read_vint(fdt_in);
          fi = fr->fis->fields[field_num];
          data_cnt = is_read_vint(fdt_in);
@@ -1449,7 +1449,7 @@
          if (store_offsets) {
              int num_positions = tv->offset_cnt = is_read_vint(fdt_in);
              Offset *offsets = tv->offsets = ALLOC_N(Offset,  
num_positions);
-            int offset = 0;
+            off_t offset = 0;
              for (i = 0; i < num_positions; i++) {
                  offsets[i].start = offset += is_read_vint(fdt_in);
                  offsets[i].end = offset += is_read_vint(fdt_in);
@@ -1683,8 +1683,8 @@
          int last_end = 0;
          os_write_vint(fdt_out, offset_count);  /* write shared  
prefix length */
          for (i = 0; i < offset_count; i++) {
-            int start = offsets[i].start;
-            int end = offsets[i].end;
+            off_t start = offsets[i].start;
+            off_t end = offsets[i].end;
              os_write_vint(fdt_out, start - last_end);
              os_write_vint(fdt_out, end - start);
              last_end = end;
@@ -4799,7 +4799,7 @@
   *
    
************************************************************************ 
****/
-Offset *offset_new(int start, int end)
+Offset *offset_new(off_t start, off_t end)
{
      Offset *offset = ALLOC(Offset);
      offset->start = start;

On Aug 8, 2007, at 4:16 PM, Craig Jolicoeur wrote:
> I have a MySQL table with over 18 million records in it.  We are
> indexing about 10 fields in this table with ferret.
>
> I am having problems with the initial building of the index.  I  
> created
> a rake task to run the "Model.rebuild_index" command in the  
> background.
> That process ran fine for about 2.5 days before it just suddenly
> stopped.  The log/ferret_index.log file says it got to about 28%  
> before
> ending.  I''m not sure if the process died because of something on
my
> server or because of something related to ferret.
>
> It appears that it will take close to 10 days for the full index to be
> build with rebuild_index?  Is this normal for a table of this size?
> Also, is there a way to start where the index ended and update from
> there instead of having to rebuild the entire index from scratch?   
> I got
> about 28% of the way through so would like to not have to waste the  
> 2.5
> days to rebuild that part again trying to get the full index 100%  
> built.
>
> Also, is there a way that I can non-destructive rebuild the index  
> since
> it didnt complete 100%? Meaning, can I rebuild it without overwriting
> what is already there?  That way I can keep what I have to be searched
> while the rebuild takes place and then move that over the old index?
> I''m not running ferret as a Drb server so I dont know if I can.
>
> Also, is there a faster or better way that I can/should be building  
> the
> index?  Will I have an issue with the index file sizes with a DB this
> size?
> -- 
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

Craig Jolicoeur

2007-Aug-08 20:53 UTC

head link

[Ferret-talk] issues with index for table with over 18 million records

Erik Morton wrote:> We have a 1 million record index that is about 6GB in size. We build
> it in parallel w/out AAF so it''s hard to comment on the speed of
your
> index build. However I will say that I did need to manually patch
> Ferret to better handle large indexes.
> 

Erik,

What issues did you find that caused you to patch the ferret code?

ALso, you say you build the index in parallel w/out AAF; how do you do 
that?  Not sure I''m following how to do that so if you can explain,
I''d
appreciate it.
-- 
Posted via http://www.ruby-forum.com/.

Erik Morton

2007-Aug-08 21:16 UTC

head link

[Ferret-talk] issues with index for table with over 18 million records

We had to patch it because we were getting seemingly random errors  
while searching a 2GB+ index. This the trac ticket: http:// 
ferret.davebalmain.com/trac/ticket/215. The patch I included changes  
some ints to off_t''s, which solved the problem. As far as I know this  
patch was never applied to the trunk.

We build our index using a modified version of RDig. We basically run  
up to 80 EC2 servers in parallel to create 80 separate indexes, which  
we later combine into a single index. You could follow a similar  
route and still have AAF mange the index after it is built. You''d  
need to make sure that the documents created by RDig/whatever have  
the same fields that AAF expects.

Erik
On Aug 8, 2007, at 4:53 PM, Craig Jolicoeur wrote:
> Erik Morton wrote:
>> We have a 1 million record index that is about 6GB in size. We build
>> it in parallel w/out AAF so it''s hard to comment on the speed
of your
>> index build. However I will say that I did need to manually patch
>> Ferret to better handle large indexes.
>>
>
>
> Erik,
>
> What issues did you find that caused you to patch the ferret code?
>
> ALso, you say you build the index in parallel w/out AAF; how do you do
> that?  Not sure I''m following how to do that so if you can
explain,
> I''d
> appreciate it.
> -- 
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

Seemingly Similar Threads

Search for more maybe matching threads

Ferret talk - Aug 2007 - issues with index for table with over 18 million records

[Ferret-talk] issues with index for table with over 18 million records

[Ferret-talk] issues with index for table with over 18 million records

[Ferret-talk] issues with index for table with over 18 million records

[Ferret-talk] issues with index for table with over 18 million records

Seemingly Similar Threads