Hi Guys, I'm trying to parse .msg files. I found a patch on trac but it looks like it uses a program called outlook2txt which I can;t find anywhere. The other thought was to pipe the file through the utility strings and then use the html parser. I do still get a little bit of junk left over tho. Anyone else know of a better way?
On Sat, Sep 13, 2008 at 01:53:26AM +0930, Frank J Bruzzaniti wrote:> I'm trying to parse .msg files. > > I found a patch on trac but it looks like it uses a program called > outlook2txt which I can;t find anywhere. > > The other thought was to pipe the file through the utility strings and > then use the html parser. I do still get a little bit of junk left over > tho. > > Anyone else know of a better way?If you have access to a Windows machine with Outlook, you can use python + COM to programmatically access the Outlook object model. It's a bit fiddly, and there are bits that aren't exposed (although there's another plugin that is supposed to fix that, I never got it to work). It was sufficient for me to export several years of emails to mbox format a while back. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james at tartarus.org uncertaintydivision.org
On Sat, Sep 13, 2008 at 01:53:26AM +0930, Frank J Bruzzaniti wrote:> I'm trying to parse .msg files. > > I found a patch on trac but it looks like it uses a program called > outlook2txt which I can;t find anywhere.Do you mean this patch: http://trac.xapian.org/ticket/285 Then it's "outlook2text" (with an "e") and the script is included in the patch. It seems to use msgconvert to do the bulk of the work, and the patch also includes a copy of that. Cheers, Olly
2008/9/15 Frank J Bruzzaniti <frank.bruzzaniti at gmail.com>:> I patched 1.0.7 but while running ./configure I got the error message: > > ./configure: line 16443: syntax error near unexpected token `unzip,extract' > ./configure: line 16443: `XO_OMEGA_WITH(unzip,extract .zip archives)' > > Have you seen this before? > ThanksIt should be XO_OMEGA_WITH([unzip],[extract .zip archives]) in configure.ac and run then $ autoreconf> P.S. You you patched 1.0.8 yet with your enhancements?No, not enough time yet, for the next 6 weeks ditto.