Yanagisawa, Koji
2008-Oct-15 02:13 UTC
[CentOS] Extract text from Microsoft PowerPoint files
Hello CentOS people, I'm wondering if there are command tools like antiword and docx2txt for Microsoft PowerPoint files (.ppt and .pptx). The idea is to extract text from PowerPoint files. Sorry this isn't exactly about CentOS, but I'd really like it if Yum has something. I tried xlhtml, but it hasn't been updated in a while and isn't exactly wanting to work on CentOS 5. Thank you,
On Tue, Oct 14, 2008 at 10:13:55PM -0400, Yanagisawa, Koji wrote:> Hello CentOS people, > > I'm wondering if there are command tools like antiword and docx2txt for > Microsoft PowerPoint files (.ppt and .pptx). The idea is to extract > text from PowerPoint files. Sorry this isn't exactly about CentOS, but > I'd really like it if Yum has something. I tried xlhtml, but it hasn't > been updated in a while and isn't exactly wanting to work on CentOS 5.Note QUITE what you're asking for, but OOo (OpenOffice.Org) reads and presents powerpoint files quite nicely... -- ---- Fred Smith -- fredex at fcshome.stoneham.ma.us ----------------------------- The Lord detests the way of the wicked but he loves those who pursue righteousness. ----------------------------- Proverbs 15:9 (niv) ----------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <lists.centos.org/pipermail/centos/attachments/20081014/f5ed939d/attachment-0003.sig>
Yanagisawa, Koji wrote:> Hello CentOS people, > > I'm wondering if there are command tools like antiword and docx2txt for > Microsoft PowerPoint files (.ppt and .pptx). The idea is to extract > text from PowerPoint files. Sorry this isn't exactly about CentOS, but > I'd really like it if Yum has something. I tried xlhtml, but it hasn't > been updated in a while and isn't exactly wanting to work on CentOS 5.man strings I used to use it to read word docs a while back, works for simple docs. nate
I'm wondering if there are command tools like antiword and docx2txt for Microsoft PowerPoint files (.ppt and .pptx). The idea is to extract text from PowerPoint files. Sorry this isn't exactly about CentOS, but I'd really like it if Yum has something. I tried xlhtml, but it hasn't been updated in a while and isn't exactly wanting to work on CentOS 5. JohnStanley Writes: If you pretty slick at Python I know for fact there is a python rtf (ritch text format) library to extract rtf. So if you look hard enough there is probally one on the net that someone has wrote. Google even has a RTF Library for Python. As a side note .Net offers Office Tools to do that very thing you want in .Net