is there any plugin which could search in PDF documents. For example, user should be able to search for keywords in the PDF contents. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Good morning - On 3-Jun-08, at 1:25 AM, ripan wrote:> > is there any plugin which could search in PDF documents. For example, > user should be able to search for keywords in the PDF contents. > >Someone submitted a patch for acts_as_solr to index documents - read the google group for this project J --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
> is there any plugin which could search in PDF documents.Maybe you can try this: http://raa.ruby-lang.org/project/rpdf2txt/ or JRoR and one of the many Java PDF libraries. I''m not aware of a Rails plugin. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
> Someone submitted a patch for acts_as_solr to index documents - read > the google group for this projectI didn''t think solr would do this, since it provides index and query but not parsing of rich formats. However, there seems to be a patch that extracts text (but not metadata) from rich documents into solr: http://wiki.apache.org/solr/UpdateRichDocuments. The solr committers are reluctant to use that patch, though, and would rather build a bridge from Tika (http://incubator.apache.org/tika/) to solr, even if that is further down the road. I did find the patch to acts_as_solr here: http://www.nabble.com/Rich-Document-support-for-solr-ruby-and-acts_as_solr-p17161561.html But since this patch relies on the uncommitted solr patch, I wouldn''t rely on this being viable for the long-term. A less tenuous solution may be to extract the text from a PDF via some other library (perhaps rpdf2txt or PDFbox), and indexing it using the standard acts_as_solr. - Mark. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---