steven shingler
2006-May-15 16:08 UTC
[Ferret-talk] Ferret not able to read a Lucene Index?
Hi all, Having problems trying to get Ferret to read an index generated by Lucene. Am I right in thinking Ferret should be able to read a Lucene generated index no problem? Using the code snippets detailed in http://www.ruby-forum.com/topic/64099#new Any advice gratefully received. Many Thanks, Steven -- Posted via http://www.ruby-forum.com/.
On May 15, 2006, at 12:08 PM, steven shingler wrote:> Am I right in thinking Ferret should be able to read a Lucene > generated > index no problem?That would be nice, but it is not currently the case because of Java''s wacky "modified" UTF-8 serialization. I''ve seen that plain ol'' ASCII text indexes will be compatible, but once you put in some higher order characters things go askew. Erik
steven shingler
2006-May-16 09:55 UTC
[Ferret-talk] Ferret not able to read a Lucene Index?
Hi Erik, Thanks for getting back to me. Ahh yes, I see what you mean - if I "Lucene-Index" only plain text files, Ferret can search that index fine (it seems). However, what I''m trying to do is index pdfs, using PDFBox to create the Lucene documents - but Ferret isn''t at all pleased when I try to search: NoMethodError: You have a nil object when you didn''t expect it! The error occured while evaluating nil.name c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_buffer.rb:31:in `read'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:90:in `next ?'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:118:in `sca n_to'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:285:in `scan_fo r_term_info'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:163:in `get_ter m_info'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_reader.rb:176:in `doc_fr eq'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in `doc_freq '' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in `each'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in `doc_freq '' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:47:in `doc_fr eq'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:13:in `initialize '' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in `new'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in `create_wei ght'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:113:in `initia lize'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in `each'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in `initia lize'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in `new'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in `create _weight'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/query.rb:51:in `weight'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:107:in `searc h'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:660:in `do_search'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:331:in `search_each'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in `synchronize'' c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in `search_each'' ./lib/ferret_client.rb:34:in `search_index'' test/functional/ferret_client_test.rb:12:in `test_search_index'' This is a shame, as I thought I was onto a winner with the Lucene/Ferret combo - especially with PDFBox able to create Lucene Docs so easily. This may not actually relate to your point of higher order chars...? Does anyone have any experience of indexing pdfs in Lucene (using PDFBox) and searching with Ferret? Or of course creating Ferret Index Docs from pdf files in ruby? Any ideas or advice gratefully received. Thanks, Steven -- Posted via http://www.ruby-forum.com/.
Hi, steven, first of all: would you mind to provide a little more info on the environment you are on: os, version of ferret, version of ruby et al. second: You might be interested in FerretFinder utility as well as RDig. Links to both of them you''ll find at the bottom of the howto section on ferret trac: http://ferret.davebalmain.com/trac/wiki/HowTos . Both of these tools seem to use pdftotext to extract content from PDFs but might be of help to you anyways. Regards Jan Prill On 5/16/06, steven shingler <shingler at gmail.com> wrote:> > Hi Erik, Thanks for getting back to me. > > Ahh yes, I see what you mean - if I "Lucene-Index" only plain text > files, Ferret can search that index fine (it seems). > > However, what I''m trying to do is index pdfs, using PDFBox to create the > Lucene documents - but Ferret isn''t at all pleased when I try to search: > > NoMethodError: You have a nil object when you didn''t expect it! > The error occured while evaluating nil.name > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_buffer.rb:31:in > `read'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:90:in > `next > ?'' > > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_term_enum.rb:118:in > `sca > n_to'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:285:in > `scan_fo > r_term_info'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/term_infos_io.rb:163:in > `get_ter > m_info'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/segment_reader.rb:176:in > `doc_fr > eq'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in > `doc_freq > '' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in > `each'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/multi_reader.rb:169:in > `doc_freq > '' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:47:in > `doc_fr > eq'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:13:in > `initialize > '' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in > `new'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/term_query.rb:99:in > `create_wei > ght'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:113:in > `initia > lize'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in > `each'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:112:in > `initia > lize'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in > `new'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/boolean_query.rb:209:in > `create > _weight'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/query.rb:51:in `weight'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/search/index_searcher.rb:107:in > `searc > h'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:660:in > `do_search'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:331:in > `search_each'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in > `synchronize'' > c:/ruby/lib/ruby/site_ruby/1.8/ferret/index/index.rb:330:in > `search_each'' > ./lib/ferret_client.rb:34:in `search_index'' > test/functional/ferret_client_test.rb:12:in `test_search_index'' > > This is a shame, as I thought I was onto a winner with the Lucene/Ferret > combo - especially with PDFBox able to create Lucene Docs so easily. > > This may not actually relate to your point of higher order chars...? > > Does anyone have any experience of indexing pdfs in Lucene (using > PDFBox) and searching with Ferret? Or of course creating Ferret Index > Docs from pdf files in ruby? > > Any ideas or advice gratefully received. > Thanks, > Steven > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060516/9c917523/attachment-0001.htm
steven shingler
2006-May-16 10:07 UTC
[Ferret-talk] Ferret not able to read a Lucene Index?
Hi Jan, Right - sorry. I''m on Windows XP(pro); ferret 0.9.1 (pure ruby); ruby 1.8.2 I''ll look into those links now. Many Thanks Steven -- Posted via http://www.ruby-forum.com/.
hey steven, have you got a linux box to your availability too? It might be of interest if the problem persists with ferret 0.9.3. If you got any scripts and test data of your pdfs I might as well check this out for you on linux, ferret 0.9.3 and ruby 1.8.4 regards Jan On 5/16/06, steven shingler <shingler at gmail.com> wrote:> > Hi Jan, > > Right - sorry. > > I''m on Windows XP(pro); ferret 0.9.1 (pure ruby); ruby 1.8.2 > > I''ll look into those links now. > Many Thanks > Steven > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060516/fd3cf380/attachment.htm
steven shingler
2006-May-16 10:54 UTC
[Ferret-talk] Ferret not able to read a Lucene Index?
Hi Jan, Yes, I''ve got an Ubuntu box I can try it on - just updated to ferret 0.9.3 and ruby 1.8.4 on it. Will have a look now and report back. Many Thanks for your help. S~ p.s. the ferret_helper finder utils look v interesting -- Posted via http://www.ruby-forum.com/.
On 5/16/06, Erik Hatcher <erik at ehatchersolutions.com> wrote:> > On May 15, 2006, at 12:08 PM, steven shingler wrote: > > Am I right in thinking Ferret should be able to read a Lucene > > generated > > index no problem? > > That would be nice, but it is not currently the case because of > Java''s wacky "modified" UTF-8 serialization. I''ve seen that plain > ol'' ASCII text indexes will be compatible, but once you put in some > higher order characters things go askew.Hey guys, What Erik said is exactly correct. Marvin Humphrey, (author of KinoSearch, a Perl port of Lucene) has submitted a patch to Lucene so that non-java ports of Lucene will be able to read Lucene indexes. It currently slows Lucene down by about 25% at the moment (I think??) so I''m going to be working with him to improve the performance of the patch so that it can one day be included in Lucene. Don''t hold your breath though. It''s going to take us a while to get it in there. For now, I''d recommend using pdftotext as Jan already mentioned. I''m not sure what is available on Windows but I''m sure it would be trivial to write your own pdftotext using Java''s PDFBox and then call it from Ruby. Cheers, Dave
Marvin Humphrey
2006-May-16 16:51 UTC
[Ferret-talk] Ferret not able to read a Lucene Index?
On May 16, 2006, at 7:53 AM, David Balmain wrote:> On 5/16/06, Erik Hatcher <erik at ehatchersolutions.com> wrote: >> >> On May 15, 2006, at 12:08 PM, steven shingler wrote: >>> Am I right in thinking Ferret should be able to read a Lucene >>> generated >>> index no problem? >> >> That would be nice, but it is not currently the case because of >> Java''s wacky "modified" UTF-8 serialization. I''ve seen that plain >> ol'' ASCII text indexes will be compatible, but once you put in some >> higher order characters things go askew. > > Hey guys, > > What Erik said is exactly correct. Marvin Humphrey, (author of > KinoSearch, a Perl port of Lucene) has submitted a patch to Lucene so > that non-java ports of Lucene will be able to read Lucene indexes. It > currently slows Lucene down by about 25% at the moment (I think??)Around 20% for indexing according to my benchmarker. I don''t have a benchmark for searching. Modified UTF-8 is not so much the problem for performance of my patch, nor is it actually causing the index incompatibility in this case. Modified UTF-8 is problematic for a couple other reasons. When text contains either null bytes or Unicode code points above the Basic Multilingual Plane (values 2^16 and up, such as U+1D160 "MUSICAL SYMBOL EIGHTH NOTE"), KinoSearch and Ferret, if they write legal UTF-8, would write indexes which would cause Lucene to crash from time to time with a baffling "read past EOF" error. Therefore, to be Lucene-compatible they''d have to pre-scan all text to detect those conditions, which would impose a performance burden and require some crufty auxilliary code to turn the legal UTF-8 into Modified UTF-8. Also, non-shortest-form UTF-8 presents a theoretical security risk, and Perl is set up to issue a warning whenever a scalar which is marked as UTF-8 isn''t shortest-form. That condition would occur whenever Modified UTF-8 containing null bytes or code points above the BMP was read in -- thus requiring that all incoming text be pre- scanned as well. Those are rare conditions, but it isn''t realistic to just say "KinoSearch|Ferret doesn''t support null bytes or characters above the BMP", because a lot of times the source text that goes into an index isn''t under the full control of the indexing/search app''s author. To be fair to Java and Lucene, they are paying a price for early commitment to the Unicode standard. Lucene''s UTF-8 encoding/decoding hasn''t been touched since Doug Cutting wrote it in 1998, when non- shortest-form UTF-8 was still legal and Unicode was still 16-bit. You could argue that the Unicode consortium pulled the rug out from under its early champions by changing the spec so that existing implementations were no longer compliant. The performance problem sof my patch and the crashing are actually tied to the Lucene File Format''s definition of a String. A String in Lucene is the length of the string in Java chars, followed by the character data translated to Modified UTF-8. A String in KinoSearch, and if I am not mistaken in Ferret as well, is the length of the character data in bytes, followed by the character data. Those two definitions of String result in identical indexes so long as your text is pure ASCII, but as Erik noted, when you add higher order characters to the mix, problems arise. You end up reading either too few bytes or too many, the stream gets out of sync, and whammo: ''Read past EOF''. My patch modifies Lucene to use bytecounts as the prefix to its Strings. Unfortunately, there are encoding/decoding inefficiencies associated with the new way of doing things. Under Lucene''s current definition of a string you allocate an array of Java char then read characters into it one by one. With the new patch, you don''t know how many chars you need, so you might have to re-allocate several times. There are ways to address that inefficiency, but they''d take a while to explain.> Don''t hold your > breath though. It''s going to take us a while to get it in there.Yeah. Modifying Lucene so that it can read both the old index format and the new without suffering a performance degradation in either case is going to be non-trivial. I''m sympathetic to the notion that it may not be worth it and that Lucene should declare its file format private. There are a lot of issues in play. No KinoSearch user has yet complained about Lucene/KinoSearch file- format compatibility. The only thing I miss is Luke -- which is significant, because Luke is really handy. How many users here care about Lucene compatibility, and why? Marvin Humphrey Rectangular Research http://www.rectangular.com/
On May 16, 2006, at 12:51 PM, Marvin Humphrey wrote:> How many users here care about Lucene compatibility, and why?Personally I''m putting my eggs into the Solr basket - http:// incubator.apache.org/solr Solr has a ton of benefits over using raw Lucene with its caching and configurable handling of putting new searchers online, etc. Its got plenty of room for improvement, and those improvements are in progress. I am integrating Solr into a Ruby on Rails front-end as we speak, but doing so crudely through a rough HTTP API, but abstracting that communication layer behind a nice Rubyish DSL would be quite cool. I used to really really want Lucene index compatibility at the file format layer along with a really fast Ruby implementation. At this point I''ve changed my mind and Solr is my recommended basis for search integration into non-Java (and even Java perhaps) applications. I just wanted to toss out my thoughts since I''ve been mostly silent on the Ferret/KinoSearch issues. I still day dream of GCJ''d Java Lucene being the basis for cross-language integration using PyLucene as a great example. They achieve 100% index compatibility with Java Lucene because it *is* Java Lucene. I''m still extremely pleased to see folks like Dave and Marvin digging deep in to Ruby and Perl integration and starting to work together. Very promising no matter how this ends up. I''m optimistic we''ll have Lucene in Ruby one of these days in a compatible way and incredibly performant way! Erik
I don''t care about the fact that Ferret isn''t able to read a Lucene index. The only problem is that when the Ferret index isn''t compatible with Lucene as is the case right now (damn EOF errors), you are not able to use Luke to take a quick peek inside the index. So a port of Luke to access Ferret would be great. Ferret should be fast, have the power of Lucene searches and be easy to access from Ruby, as it is right now. If you are going to use Lucene, go all the way and stick to Java. Only problem with Ferret is that the C version isn''t available on Windows (for testing purposes) yet, but that is being worked on. GJC and SWIG sounds great but setting it up is a real pain in the ass, great for techies, but horrible for all the others. Solr looks a promising project, only problem I have with it is that you need Tomcat and a JVM. This adds two more variables to your configuration you have to control. Great if you know Java, but I''m programming in Ruby so I don''t have to program in Java or .NET, or whatever. So I prefer a Ruby only environment for it''s simplicity. So Luke is a definite plus as a debugging tool. Kind regards, Nick -- Posted via http://www.ruby-forum.com/.
On May 16, 2006, at 3:30 PM, Nick Snels wrote:> Solr looks a promising project, only problem I have with it is that > you > need Tomcat and a JVM. This adds two more variables to your > configuration you have to control. Great if you know Java, but I''m > programming in Ruby so I don''t have to program in Java or .NET, or > whatever. So I prefer a Ruby only environment for it''s simplicity.A fair and expected critique of using Solr in a Ruby environment. Every language enjoys a bit of lock-in and programmers obviously would prefer to work with native API''s. It is true you need a JVM to run Solr, but it doesn''t have to be Tomcat. I use Jetty. To fire up Solr in my Rails environment only required I customize its schema.xml and solrconfig.xml files and run "java -jar start.jar". And voila, its up and running. So while it does add an entirely new moving piece, I view it as something akin to adding a database. As long as there is a good way to communicate with it natively (a Ruby/Solr API would be well received, methinks) then Solr isn''t any more, actually less, overhead to a projects deployment than adding a database server. Erik
Marvin Humphrey
2006-May-16 21:27 UTC
[Ferret-talk] Ferret not able to read a Lucene Index?
On May 16, 2006, at 12:30 PM, Nick Snels wrote:> I don''t care about the fact that Ferret isn''t able to read a Lucene > index. The only problem is that when the Ferret index isn''t compatible > with Lucene as is the case right now (damn EOF errors), you are not > able > to use Luke to take a quick peek inside the index. So a port of > Luke to > access Ferret would be great.You know what... I think using Luke powered by a version of Lucene with my patch applied would allow it to read Ferret indexes. I don''t have time to check this out right now. And ironically, I''ve made further mods to KinoSearch''s file format, so it wouldn''t make Luke available to KinoSearch users unless I change it back. hahaha. ":o The patch was prepared against subversion, but it might work against 1.9.1. If it doesn''t, it would be trival to finish it and package it up. Maybe we can convince the Lucene folks to distribute it through their channels... or I can put it up at my site. Maybe Luke''s author would be amenable to distributing it from his site, but I dunno about that - people might blame him rather than me or Balmain when stuff fails to work. Marvin Humphrey Rectangular Research http://www.rectangular.com/
On 5/17/06, Marvin Humphrey <marvin at rectangular.com> wrote:> How many users here care about Lucene compatibility, and why?Great question. Who does care, and why? Performance used to be a very good reason but that doesn''t apply anymore. Is it Java''s libraries? Java does have PDFBox for example. Unfortunately Ruby doesn''t yet have an equivalent but there are ways around this. The only good reason I can think of is the lack of a Luke port. Anyone care to enlighten us? Cheers, Dave
hey Marvin, is there a link in this thread already? I''ve found http://issues.apache.org/jira/browse/LUCENE-510?page=comments#action_12378519as well as the links at the bottom of http://www.archivum.info/java-dev at lucene.apache.org/2005-09/msg00025.htmlwith google. Is there anything else? I''ll definitly try this out but wanted to make sure if this is the latest development... Regards Jan On 5/16/06, Marvin Humphrey <marvin at rectangular.com> wrote:> > > On May 16, 2006, at 12:30 PM, Nick Snels wrote: > > > I don''t care about the fact that Ferret isn''t able to read a Lucene > > index. The only problem is that when the Ferret index isn''t compatible > > with Lucene as is the case right now (damn EOF errors), you are not > > able > > to use Luke to take a quick peek inside the index. So a port of > > Luke to > > access Ferret would be great. > > You know what... I think using Luke powered by a version of Lucene > with my patch applied would allow it to read Ferret indexes. > > I don''t have time to check this out right now. And ironically, I''ve > made further mods to KinoSearch''s file format, so it wouldn''t make > Luke available to KinoSearch users unless I change it back. hahaha. ":o > > The patch was prepared against subversion, but it might work against > 1.9.1. If it doesn''t, it would be trival to finish it and package it > up. Maybe we can convince the Lucene folks to distribute it through > their channels... or I can put it up at my site. Maybe Luke''s author > would be amenable to distributing it from his site, but I dunno about > that - people might blame him rather than me or Balmain when stuff > fails to work. > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060517/9b7c026b/attachment.htm
Hi Dave, IMHO there are two things: 1. these little marketing and management issues that often have no valid reason but make a big difference: Programmer / Freelancer : let''s use ruby we''ll even be able to build a superfast search interface to all your great marketing docs with ferret, rails and ruby Manager: i think we''ve got this, it''s implemented by something called bluezeneeee P/F: yes we even might use the indexes of this and perform searches with the old system while we are changing... M: changing what P/F: the system to ruby, ferret... M: WTF? for these conversations it would be of help to stay in the background as much as possible with changes as possible... 2. Tools around Lucene I think people will now give marvins patch and luke a try, but luke is not the only thing. Thanks to eric for putting up solr. I think it''s a little bit of the old java 90%/10% - thingy. For 90% of webapps all the java, spring, hibernate stuff is damn complex and you''ll be faster with ruby. but the 10 or less percent, often the big money stuff of fortune companys, of banks etc. made their management decision to either j2ee or .net. And for these projects the programming teams often need distributed and high volume things, see cnet and solr. I''ve heard about solr on this thread for the first time and wonder a little how it does together with nutch / hadoop for the distributed things but will do some googleing on this myself. I think there is definitly need - also in the ruby world - for search engines and crawlers. And nutch has some nifty features about RDig. Discussions about the interchangeability between nutch and ferret are showing that people are interested in using Lucene tools but front end with ruby, rails and ferret. I''ve for example tried to work with ferret on a nutch index and luckily ferret didn''t choke on the index because there were no utf-8 chars in there. So I could extract url, segment, docno but then there came this nfs / hadoop thing to extract content and summaries as well and I gave up. There also seems to be interest and need in distributed search architectures as the p2p efforts of hyperestraier as well as nfs / hadoop and solr (rsync?) are showing... Regards Jan On 5/17/06, David Balmain <dbalmain.ml at gmail.com> wrote:> > On 5/17/06, Marvin Humphrey <marvin at rectangular.com> wrote: > > How many users here care about Lucene compatibility, and why? > > Great question. Who does care, and why? Performance used to be a very > good reason but that doesn''t apply anymore. Is it Java''s libraries? > Java does have PDFBox for example. Unfortunately Ruby doesn''t yet have > an equivalent but there are ways around this. The only good reason I > can think of is the lack of a Luke port. Anyone care to enlighten us? > > Cheers, > Dave > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20060517/11d7d92d/attachment.htm
steven shingler
2006-May-17 10:15 UTC
[Ferret-talk] Ferret not able to read a Lucene Index?
I agree with Jan''s ''real-world'' scenario - it is the reason I started this thread in the first place... :) ...not so much because of management pressures, but I see merit in being able to create indexes in either Java or Ruby, then use Rails to present a query interface. It keeps one''s options open - particularly with PDFBox and POI in the Java space, although I''m looking into both routes of the pdftotext/ferret_helper tools, and applying Marvin''s patch - so perhaps both paths can remain open. Thanks to all though, for contributing to this very interesting thread! :) Cheers Steven -- Posted via http://www.ruby-forum.com/.