Hi, the last couple of days I''m trying to index some txt files. Once indexed I have the habit of checking the contents of the Ferret index with Luke. But everytime I tried to open the index I got a ''read past EOF'' error. I managed to get it down to the way Ferret handles non-ascii characters. I have one txt file with the following content ''a o b c'' and one with ''? ? ? ?'' . If I index the first one I can read the index perfectly, however when I index the second one I get the EOF error. The error is with the standard and whitespace analyzers. The stop analyzer just ignores these characters. How can I solve this, so that Ferret handles these ''special'' characters correctly. Thanks. Kind regards, Nick -- Posted via http://www.ruby-forum.com/.
Hi Nick, Sorry but this is due to an incompatibilities with the index. It''s complicated but basically, Ferret counts string lengths in bytes while Lucene sometimes uses number of characters. I do plan to fix this in the future but it could be a month or two. Hope you can wait that long. Cheers, Dave On 1/27/06, Nick Snels <nick.snels at gmail.com> wrote:> Hi, > > the last couple of days I''m trying to index some txt files. Once indexed > I have the habit of checking the contents of the Ferret index with Luke. > But everytime I tried to open the index I got a ''read past EOF'' error. I > managed to get it down to the way Ferret handles non-ascii characters. I > have one txt file with the following content ''a o b c'' and one with ''? ? > ? ?'' . If I index the first one I can read the index perfectly, however > when I index the second one I get the EOF error. The error is with the > standard and whitespace analyzers. The stop analyzer just ignores these > characters. How can I solve this, so that Ferret handles these ''special'' > characters correctly. Thanks. > > Kind regards, > > Nick > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
Hi David, good to hear that it will be fixed in the near future. For me personally it doesn''t matter that it takes a month or two. I have tons of other stuff I have to add, before it is finished. Will this be around the same period that cFerret will be ready for prime time? Kind regards, Nick -- Posted via http://www.ruby-forum.com/.
On 1/27/06, Nick Snels <nick.snels at gmail.com> wrote:> Hi David, > > good to hear that it will be fixed in the near future. For me personally > it doesn''t matter that it takes a month or two. I have tons of other > stuff I have to add, before it is finished. Will this be around the same > period that cFerret will be ready for prime time?Hopefully cFerret will be finished before then. I just have to finish implementing span queries and threading and then I''ll be ready to start adding the ruby bindings. The fix to make the indexes of Ferret and Lucene compatible will hopefully involve a patch to Lucene rather than a fix to Ferret but I may have difficulty getting it accepted. I realize index compatibility with Lucene is a show stopper for many people so it''s definitely high priority and I''ll get it done one way or another. Dave> Kind regards, > > Nick > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >