Hi People, I just joined the group and I want to ask something about my problem. I''m still learning Ruby on Rails and now I have a task to parse Microsoft Word and store the content into database. Do you have any suggestion how to do it? FYI, I develop it under Unix Environment. So, I don''t have a chance to use win32ole on it, CMIIW. I also have searched the internet about this. But all I found that I need to use JRuby and combine it with Apache POI or else I need to use win32ole. As far as I know, to use JRuby I need to create the rails project also with JRuby but unfortunately I already created the project with plain Ruby. So, I don''t know what to do anymore. Does anybody have clue? Regards, Hafiz Badrie Lubis -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Mar 16, 2011, at 12:51 PM, Hafiz Badrie Lubis wrote:> But all I found that I > need to use JRuby and combine it with Apache POI or else I need to use > win32ole.You can run poi as a separate process and then grab its output. -- Scott Ribe scott_ribe-ZCQMRMivIIdUL8GK/JU1Wg@public.gmane.org http://www.elevated-dev.com/ (303) 722-0567 voice -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Mar 16, 2011, at 2:51 PM, Hafiz Badrie Lubis wrote:> Hi People, > > I just joined the group and I want to ask something about my problem. > I''m still learning Ruby on Rails and now I have a task to parse > Microsoft Word and store the content into database. > > Do you have any suggestion how to do it? > > FYI, I develop it under Unix Environment. So, I don''t have a chance to > use win32ole on it, CMIIW. > > I also have searched the internet about this. But all I found that I > need to use JRuby and combine it with Apache POI or else I need to use > win32ole. As far as I know, to use JRuby I need to create the rails > project also with JRuby but unfortunately I already created the > project with plain Ruby. > > So, I don''t know what to do anymore. Does anybody have clue?I did a project in PHP quite a few years ago, and I used some venerable unix cli converters to do this. I stored the files as is, and then used these converters to rip out their text and stored that in the database for searching. They aren''t perfect, but they do a good enough job for search results. $translators = array( ''pdf'' => ''/usr/local/bin/pdftotext ./pdf/%s.pdf -'', ''ppt'' => ''/usr/local/bin/catppt -d ascii ./ppt/%s.ppt'', ''xls'' => ''/usr/local/bin/xls2csv -d ascii ./xls/%s.xls'', ''doc'' => ''/usr/local/bin/catdoc -d ascii ./doc/%s.doc'' ); //these translators all pipe to stdout, which means that shell_exec will return their text value Walter> > Regards, > > Hafiz Badrie Lubis > > -- > You received this message because you are subscribed to the Google > Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails- > talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > . > For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en > . >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
1. Convert .doc to .pdf with PyODConverter http://www.artofsolving.com/opensource/pyodconverter 2. Convert .pdf to .tiff with ImageMagick 3. Process .tiff through Tesseract OCR and get .txt On Wed, Mar 16, 2011 at 9:51 PM, Hafiz Badrie Lubis <hafiz.b.lubis-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Hi People, > > I just joined the group and I want to ask something about my problem. > I''m still learning Ruby on Rails and now I have a task to parse > Microsoft Word and store the content into database. > > Do you have any suggestion how to do it? > > FYI, I develop it under Unix Environment. So, I don''t have a chance to > use win32ole on it, CMIIW. > > I also have searched the internet about this. But all I found that I > need to use JRuby and combine it with Apache POI or else I need to use > win32ole. As far as I know, to use JRuby I need to create the rails > project also with JRuby but unfortunately I already created the > project with plain Ruby. > > So, I don''t know what to do anymore. Does anybody have clue? > > Regards, > > Hafiz Badrie Lubis > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Can you show it to me how to do it? Do you have a reference? To make a collaboration between a rails project with JRuby codes. On Mar 17, 3:32 am, Scott Ribe <scott_r...-oLpdKesOropn0q4rP9l2pw@public.gmane.org> wrote:> On Mar 16, 2011, at 12:51 PM, Hafiz Badrie Lubis wrote: > > > But all I found that I > > need to use JRuby and combine it with Apache POI or else I need to use > > win32ole. > > You can run poi as a separate process and then grab its output. > > -- > Scott Ribe > scott_r...-ZCQMRMivIIdUL8GK/JU1WhHnuRYL88vP@public.gmane.org://www.elevated-dev.com/ > (303) 722-0567 voice-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Mar 16, 2011, at 8:06 PM, Hafiz Badrie Lubis wrote:> To make a collaboration between a rails project with JRuby codes.It has nothing whatsoever to do with JRuby. You can run Java apps from Ruby exactly like any other command-line process. I don''t know if POI is just a library, or has a full app utility as well. If it''s just a lib, you''d have to write the program, probably a half-dozen lines of Java. -- Scott Ribe scott_ribe-ZCQMRMivIIdUL8GK/JU1Wg@public.gmane.org http://www.elevated-dev.com/ (303) 722-0567 voice -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Mar 16, 2011, at 5:10 PM, Vladimir Rybas wrote:> 1. Convert .doc to .pdf with PyODConverter > http://www.artofsolving.com/opensource/pyodconverter > > 2. Convert .pdf to .tiff with ImageMagick > > 3. Process .tiff through Tesseract OCR and get .txtWow, talk about a long slow way to potentially lose text flow and introduce errors... -- Scott Ribe scott_ribe-ZCQMRMivIIdUL8GK/JU1Wg@public.gmane.org http://www.elevated-dev.com/ (303) 722-0567 voice -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Ok thank you, Scott. I''ll try your advice. On Mar 17, 11:25 am, Scott Ribe <scott_r...-oLpdKesOropn0q4rP9l2pw@public.gmane.org> wrote:> On Mar 16, 2011, at 8:06 PM, Hafiz Badrie Lubis wrote: > > > To make a collaboration between a rails project with JRuby codes. > > It has nothing whatsoever to do with JRuby. You can run Java apps from Ruby exactly like any other command-line process. I don''t know if POI is just a library, or has a full app utility as well. If it''s just a lib, you''d have to write the program, probably a half-dozen lines of Java. > > -- > Scott Ribe > scott_r...-ZCQMRMivIIdUL8GK/JU1WhHnuRYL88vP@public.gmane.org://www.elevated-dev.com/ > (303) 722-0567 voice-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
I''m coming to this late and I''ve partially deleted the thread, so I may be way off base... An old plugin might be of help: https://github.com/kete/convert_attachment_to It makes use of existing command line converter utility programs. Cheers, Walter On Mar 17, 2011, at 5:54 PM, Hafiz Badrie Lubis <hafiz.b.lubis-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> Ok thank you, Scott. > I''ll try your advice. > > On Mar 17, 11:25 am, Scott Ribe <scott_r...-oLpdKesOropn0q4rP9l2pw@public.gmane.org> wrote: >> On Mar 16, 2011, at 8:06 PM, Hafiz Badrie Lubis wrote: >> >>> To make a collaboration between a rails project with JRuby codes. >> >> It has nothing whatsoever to do with JRuby. You can run Java apps from Ruby exactly like any other command-line process. I don''t know if POI is just a library, or has a full app utility as well. If it''s just a lib, you''d have to write the program, probably a half-dozen lines of Java. >> >> -- >> Scott Ribe >> scott_r...-ZCQMRMivIIdUL8GK/JU1WhHnuRYL88vP@public.gmane.org://www.elevated-dev.com/ >> (303) 722-0567 voice > > -- > You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en. >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.