thr3ads.net - Ferret talk - [Ferret-talk] Problem with large index file [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Jeffrey Gelens

2007-Feb-26 06:18 UTC

[Ferret-talk] Problem with large index file

Hello,

Ferret created a 4.5GB> index file.
$ 4534029210 2007-02-26 12:46 _el.cfs

The creation of the index went smoothly. Searching through this index
also works fine. However whenever I try to get the contents of an
indexed document I get an error when the document number is above
621108:

irb(main):080:0> searcher[621108].load
IOError: IO Error occured at <except.c>:79 in xraise
Error occured in fs_store.c:289 - fsi_seek_i
        seeking pos -1206037603: <Invalid argument>

As you can see it is seeking on a negative position. I did a strace on
this with the following results:

_llseek(3, 18446744072766697140, 0xbfc555e0, SEEK_SET) = -1 EINVAL
(Invalid argument)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(2, "./service.cgi:40:in `[]\''", 24./service.cgi:40:in
`[]'') = 24
write(2, ": ", 2: )                       = 2
write(2, "IO Error occured at <except.c>:7"..., 43IO Error
occured at
<except.c>:79 in xraise) = 43
write(2, " (", 2 ()                       = 2
write(2, "IOError", 7IOError)                  = 7
write(2, ")\n", 2)
)                      = 2
write(2, "Error occured in fs_store.c:289 "..., 90Error occured in
fs_store.c:289 - fsi_seek_i
        seeking pos -942854476: <Invalid argument>

The lseek() on 18446744072766697140 is over the maximum of long. That''s
why lseek is probably giving this error.

How can I fix this?

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2007-Feb-26 14:31 UTC

head link

[Ferret-talk] Problem with large index file

On 2/26/07, Jeffrey Gelens <jgelens at gmail.com>
wrote:> Hello,
>
> Ferret created a 4.5GB> index file.
> $ 4534029210 2007-02-26 12:46 _el.cfs
>
> The creation of the index went smoothly. Searching through this index
> also works fine. However whenever I try to get the contents of an
> indexed document I get an error when the document number is above
> 621108:
>
> irb(main):080:0> searcher[621108].load
> IOError: IO Error occured at <except.c>:79 in xraise
> Error occured in fs_store.c:289 - fsi_seek_i
>         seeking pos -1206037603: <Invalid argument>
>
> As you can see it is seeking on a negative position. I did a strace on
> this with the following results:
>
> _llseek(3, 18446744072766697140, 0xbfc555e0, SEEK_SET) = -1 EINVAL
> (Invalid argument)
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> write(2, "./service.cgi:40:in `[]\''",
24./service.cgi:40:in `[]'') = 24
> write(2, ": ", 2: )                       = 2
> write(2, "IO Error occured at <except.c>:7"..., 43IO Error
occured at
> <except.c>:79 in xraise) = 43
> write(2, " (", 2 ()                       = 2
> write(2, "IOError", 7IOError)                  = 7
> write(2, ")\n", 2)
> )                      = 2
> write(2, "Error occured in fs_store.c:289 "..., 90Error occured
in
> fs_store.c:289 - fsi_seek_i
>         seeking pos -942854476: <Invalid argument>
>
> The lseek() on 18446744072766697140 is over the maximum of long.
That''s
> why lseek is probably giving this error.
>
> How can I fix this?
Actually 18446744072766697140 is too big for even a 64bit long (or a
long long on 32bit systems) so I''d love to know where that number is
coming from. It is obviously a bug somewhere else. Unfortunately it
would be impractical for you to send me the index. If it is possible
to give me access to your server I should be able to sort this out
though. Otherwise, I''ll look into it, but I can''t promise
anything.

Dave

-- 
Dave Balmain
http://www.davebalmain.com/

Jeffrey Gelens

2007-Feb-27 01:42 UTC

head link

[Ferret-talk] Problem with large index file

David Balmain wrote:> Actually 18446744072766697140 is too big for even a 64bit long (or a
> long long on 32bit systems) so I''d love to know where that number
is
> coming from. It is obviously a bug somewhere else. Unfortunately it
> would be impractical for you to send me the index. If it is possible
> to give me access to your server I should be able to sort this out
> though. Otherwise, I''ll look into it, but I can''t promise
anything.
> 
> Dave
I can''t give access to the server as its a company server, sorry.
Is there a possibility that the index somehow got corrupted? At the 
moment I am recreating the index, which takes several days. I''ll report
on the findings when it''s done.

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2007-Feb-27 03:47 UTC

head link

[Ferret-talk] Problem with large index file

On 2/27/07, Jeffrey Gelens <jgelens at gmail.com>
wrote:> David Balmain wrote:
> > Actually 18446744072766697140 is too big for even a 64bit long (or a
> > long long on 32bit systems) so I''d love to know where that
number is
> > coming from. It is obviously a bug somewhere else. Unfortunately it
> > would be impractical for you to send me the index. If it is possible
> > to give me access to your server I should be able to sort this out
> > though. Otherwise, I''ll look into it, but I can''t
promise anything.
> >
> > Dave
>
> I can''t give access to the server as its a company server, sorry.
> Is there a possibility that the index somehow got corrupted? At the
> moment I am recreating the index, which takes several days. I''ll
report
> on the findings when it''s done.
It could be a corrupt index but I doubt it. I think it is more likely
a bug somewhere else. I have built indexes of this size before without
problem though. Perhaps if you could give me an idea of what type of
data you are putting in the index I could try and rebuild a similar
index here to diagnose the problem. ie. how many documents, how many
fields, what are the field settings (eg stored, untokenized,
term_vectors etc), how large are the fields on average and what sort
of data (eg numbers dates english language, code etc) and also what
analyzer are you using. This should give me enough information to
build a very similar index here and hopefully reproduce the problem.

Cheers,
Dave

PS: send it to me privately if you prefer

-- 
Dave Balmain
http://www.davebalmain.com/

Jeffrey Gelens

2007-Mar-05 07:52 UTC

head link

[Ferret-talk] Problem with large index file

I recreated the index with this option :max_merge_docs => 100000 and it 
seems to work great.

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2007-Mar-06 03:15 UTC

head link

[Ferret-talk] Problem with large index file

Hi Jeffrey,

That''s great to hear. If you have a chance, could you try copying the
index (cp -r) and then opening the copy and optimizing it. Then let me
know if you are still getting the same problem you were getting
before. I understand if this is too much trouble. 5Gb is a lot of data
to be playing around with.

Cheers,
Dave

On 3/5/07, Jeffrey Gelens <jgelens at gmail.com>
wrote:> I recreated the index with this option :max_merge_docs => 100000 and it
> seems to work great.
>
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

-- 
Dave Balmain
http://www.davebalmain.com/

Jeffrey Gelens

2007-Mar-06 05:51 UTC

head link

[Ferret-talk] Problem with large index file

After optimization the exact same problem occurs.

Greetings,
Jeffrey Gelens

David Balmain wrote:> Hi Jeffrey,
> 
> That''s great to hear. If you have a chance, could you try copying
the
> index (cp -r) and then opening the copy and optimizing it. Then let me
> know if you are still getting the same problem you were getting
> before. I understand if this is too much trouble. 5Gb is a lot of data
> to be playing around with.
> 
> Cheers,
> Dave

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2007-Mar-06 11:00 UTC

head link

[Ferret-talk] Problem with large index file

On 3/6/07, Jeffrey Gelens <jgelens at gmail.com>
wrote:> After optimization the exact same problem occurs.
Thanks Jeffrey, I''ll keep looking into this. I''m glad your
index works
for the moment though.

Cheers,
Dave

-- 
Dave Balmain
http://www.davebalmain.com/

Apparently Analagous Threads

Search for more seemingly similar threads

Ferret talk - Feb 2007 - Problem with large index file

[Ferret-talk] Problem with large index file

[Ferret-talk] Problem with large index file

[Ferret-talk] Problem with large index file

[Ferret-talk] Problem with large index file

[Ferret-talk] Problem with large index file

[Ferret-talk] Problem with large index file

[Ferret-talk] Problem with large index file

[Ferret-talk] Problem with large index file

Apparently Analagous Threads