Hi, I would like to add "text/x-mail" support to omega. I'm using mhonarc to export mail to HTML format and I'm using HTML parse to index mail content (largely inspired by "application/vnd.ms-outlook" format). The problem is that files attached to the mail are not indexing at all. I think it's not possible in "index_file" function to index 2 files as one document. I can't find easily solution for my problem. I think I must spit this function to separate document's creation and file indexing. Any other suggesting? Regards, -- Emmanuel Garette Ing?nieur logiciels libres Cadoles (http://www.cadoles.com) Experts EOLE, Gaspacho, logiciels libres -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <http://lists.xapian.org/pipermail/xapian-discuss/attachments/20141213/5287b68f/attachment.sig>
On Sat, Dec 13, 2014 at 08:32:58PM +0100, Emmanuel Garette wrote:> I would like to add "text/x-mail" support to omega. I'm using mhonarc to > export mail to HTML format and I'm using HTML parse to index mail > content (largely inspired by "application/vnd.ms-outlook" format). > > The problem is that files attached to the mail are not indexing at all. > I think it's not possible in "index_file" function to index 2 files as > one document. > > I can't find easily solution for my problem. I think I must spit this > function to separate document's creation and file indexing.I've done some work on indexing attachments and files inside archives (like tar and zip files), but I haven't merged it yet as it's not entirely satisfactory in various ways, most of which require some refactoring of omindex to address. The approach I took to attachments was to index them as separate documents - if I follow you correctly, you seem to be trying to treat them as part of a single document. Is there a particular reason why you are taking that approach? I don't think my code is anywhere public currently, but I can rebase it onto current master and put it on a git branch if it's potentially useful to others in its current form. Cheers, Olly
Le 15/12/2014 23:22, Olly Betts a ?crit :> On Sat, Dec 13, 2014 at 08:32:58PM +0100, Emmanuel Garette wrote: >> I would like to add "text/x-mail" support to omega. I'm using mhonarc to >> export mail to HTML format and I'm using HTML parse to index mail >> content (largely inspired by "application/vnd.ms-outlook" format). >> >> The problem is that files attached to the mail are not indexing at all. >> I think it's not possible in "index_file" function to index 2 files as >> one document. >> >> I can't find easily solution for my problem. I think I must spit this >> function to separate document's creation and file indexing. > I've done some work on indexing attachments and files inside archives > (like tar and zip files), but I haven't merged it yet as it's not > entirely satisfactory in various ways, most of which require some > refactoring of omindex to address. > > The approach I took to attachments was to index them as separate > documents - if I follow you correctly, you seem to be trying to treat > them as part of a single document. Is there a particular reason why > you are taking that approach? > > I don't think my code is anywhere public currently, but I can rebase > it onto current master and put it on a git branch if it's potentially > useful to others in its current form.In my opinion, one file is a document. But maybe I'm wrong. The problem is that we cannot construct path (prefixed by U) in this case. How deal with path if an email could generate more than one document? Something like "U/path/to/mail|Attached.pdf"? Or we could add a new prefix? I'm interesting by your work on indexing archives to understand how you extect to build path. Regards,> > Cheers, > Olly-- Emmanuel Garette Ing?nieur logiciels libres Cadoles (http://www.cadoles.com) Experts EOLE, Gaspacho, logiciels libres -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <http://lists.xapian.org/pipermail/xapian-discuss/attachments/20141216/fdbd22d6/attachment.sig>