Hello, Ferret created a 4.5GB> index file. $ 4534029210 2007-02-26 12:46 _el.cfs The creation of the index went smoothly. Searching through this index also works fine. However whenever I try to get the contents of an indexed document I get an error when the document number is above 621108: irb(main):080:0> searcher[621108].load IOError: IO Error occured at <except.c>:79 in xraise Error occured in fs_store.c:289 - fsi_seek_i seeking pos -1206037603: <Invalid argument> As you can see it is seeking on a negative position. I did a strace on this with the following results: _llseek(3, 18446744072766697140, 0xbfc555e0, SEEK_SET) = -1 EINVAL (Invalid argument) rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 write(2, "./service.cgi:40:in `[]\''", 24./service.cgi:40:in `[]'') = 24 write(2, ": ", 2: ) = 2 write(2, "IO Error occured at <except.c>:7"..., 43IO Error occured at <except.c>:79 in xraise) = 43 write(2, " (", 2 () = 2 write(2, "IOError", 7IOError) = 7 write(2, ")\n", 2) ) = 2 write(2, "Error occured in fs_store.c:289 "..., 90Error occured in fs_store.c:289 - fsi_seek_i seeking pos -942854476: <Invalid argument> The lseek() on 18446744072766697140 is over the maximum of long. That''s why lseek is probably giving this error. How can I fix this? -- Posted via http://www.ruby-forum.com/.
On 2/26/07, Jeffrey Gelens <jgelens at gmail.com> wrote:> Hello, > > Ferret created a 4.5GB> index file. > $ 4534029210 2007-02-26 12:46 _el.cfs > > The creation of the index went smoothly. Searching through this index > also works fine. However whenever I try to get the contents of an > indexed document I get an error when the document number is above > 621108: > > irb(main):080:0> searcher[621108].load > IOError: IO Error occured at <except.c>:79 in xraise > Error occured in fs_store.c:289 - fsi_seek_i > seeking pos -1206037603: <Invalid argument> > > As you can see it is seeking on a negative position. I did a strace on > this with the following results: > > _llseek(3, 18446744072766697140, 0xbfc555e0, SEEK_SET) = -1 EINVAL > (Invalid argument) > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > write(2, "./service.cgi:40:in `[]\''", 24./service.cgi:40:in `[]'') = 24 > write(2, ": ", 2: ) = 2 > write(2, "IO Error occured at <except.c>:7"..., 43IO Error occured at > <except.c>:79 in xraise) = 43 > write(2, " (", 2 () = 2 > write(2, "IOError", 7IOError) = 7 > write(2, ")\n", 2) > ) = 2 > write(2, "Error occured in fs_store.c:289 "..., 90Error occured in > fs_store.c:289 - fsi_seek_i > seeking pos -942854476: <Invalid argument> > > The lseek() on 18446744072766697140 is over the maximum of long. That''s > why lseek is probably giving this error. > > How can I fix this?Actually 18446744072766697140 is too big for even a 64bit long (or a long long on 32bit systems) so I''d love to know where that number is coming from. It is obviously a bug somewhere else. Unfortunately it would be impractical for you to send me the index. If it is possible to give me access to your server I should be able to sort this out though. Otherwise, I''ll look into it, but I can''t promise anything. Dave -- Dave Balmain http://www.davebalmain.com/
David Balmain wrote:> Actually 18446744072766697140 is too big for even a 64bit long (or a > long long on 32bit systems) so I''d love to know where that number is > coming from. It is obviously a bug somewhere else. Unfortunately it > would be impractical for you to send me the index. If it is possible > to give me access to your server I should be able to sort this out > though. Otherwise, I''ll look into it, but I can''t promise anything. > > DaveI can''t give access to the server as its a company server, sorry. Is there a possibility that the index somehow got corrupted? At the moment I am recreating the index, which takes several days. I''ll report on the findings when it''s done. -- Posted via http://www.ruby-forum.com/.
On 2/27/07, Jeffrey Gelens <jgelens at gmail.com> wrote:> David Balmain wrote: > > Actually 18446744072766697140 is too big for even a 64bit long (or a > > long long on 32bit systems) so I''d love to know where that number is > > coming from. It is obviously a bug somewhere else. Unfortunately it > > would be impractical for you to send me the index. If it is possible > > to give me access to your server I should be able to sort this out > > though. Otherwise, I''ll look into it, but I can''t promise anything. > > > > Dave > > I can''t give access to the server as its a company server, sorry. > Is there a possibility that the index somehow got corrupted? At the > moment I am recreating the index, which takes several days. I''ll report > on the findings when it''s done.It could be a corrupt index but I doubt it. I think it is more likely a bug somewhere else. I have built indexes of this size before without problem though. Perhaps if you could give me an idea of what type of data you are putting in the index I could try and rebuild a similar index here to diagnose the problem. ie. how many documents, how many fields, what are the field settings (eg stored, untokenized, term_vectors etc), how large are the fields on average and what sort of data (eg numbers dates english language, code etc) and also what analyzer are you using. This should give me enough information to build a very similar index here and hopefully reproduce the problem. Cheers, Dave PS: send it to me privately if you prefer -- Dave Balmain http://www.davebalmain.com/
I recreated the index with this option :max_merge_docs => 100000 and it seems to work great. -- Posted via http://www.ruby-forum.com/.
Hi Jeffrey, That''s great to hear. If you have a chance, could you try copying the index (cp -r) and then opening the copy and optimizing it. Then let me know if you are still getting the same problem you were getting before. I understand if this is too much trouble. 5Gb is a lot of data to be playing around with. Cheers, Dave On 3/5/07, Jeffrey Gelens <jgelens at gmail.com> wrote:> I recreated the index with this option :max_merge_docs => 100000 and it > seems to work great. > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-- Dave Balmain http://www.davebalmain.com/
After optimization the exact same problem occurs. Greetings, Jeffrey Gelens David Balmain wrote:> Hi Jeffrey, > > That''s great to hear. If you have a chance, could you try copying the > index (cp -r) and then opening the copy and optimizing it. Then let me > know if you are still getting the same problem you were getting > before. I understand if this is too much trouble. 5Gb is a lot of data > to be playing around with. > > Cheers, > Dave-- Posted via http://www.ruby-forum.com/.
On 3/6/07, Jeffrey Gelens <jgelens at gmail.com> wrote:> After optimization the exact same problem occurs.Thanks Jeffrey, I''ll keep looking into this. I''m glad your index works for the moment though. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/