I''ve never been able to find a reliable, open source solution to this
problem, if anyone knows of one I''d really like to know about it as
well.
Here are some options that I know of:
If you just have a few PDFs, you can save them as HTML from Acrobat (not
Reader), or with Adobe''s online conversion tool at:
http://www.adobe.com/products/acrobat/access_onlinetools.html
So if a commercial, non-Ruby solution is OK for you, Adobe obviously can do
what you want and the appropriate capabilities to convert many documents are
probably available in their server products. Or you might be able to get at
what you want through the Acrobat SDK.
There is a commercial product called PDFLib (http://www.pdflib.org). It
works with almost every major programming language, including Ruby, and has
a ton of features. No direct conversion to HTML, but you can extract text
with PDFLib TET and then mark it up with Ruby.
The only totally open option I know of is PDFBox (http://www.pdfbox.org).
Its a Java library of PDF functions, including the ability to extract text
similar to PDFLib TET, but again you''re on you''re own to mark
it up as HTML.
HTH,
Jeff
On 7/29/06, Bryan Duxbury <bryan.duxbury@gmail.com>
wrote:>
> Does anyone know of a good package that can convert a PDF into HTML?
> Cross-platform compatible is a plus, but I can live with Linux-only if
> it comes to that.
>
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://wrath.rubyonrails.org/pipermail/rails/attachments/20060730/9e3502c2/attachment-0001.html