Hi, everyone. I'm a third-year student in Computer Science. I have a few projects (school-related) on Bitbucket <https://bitbucket.org/philipchung/philipchungtech>. I've been looking at the project-ideas list and I'm interested in making Omega use libraries instead of external programs. Right now I'm trying to get Olly's patch that was linked there to apply to the current master. From that point I would see if I could generalize this to other types of extraction. I was thinking of different executables for each type of extraction. Is this a good way to go, or is there a better way to go about it? Thanks for your time. Philip Chung
On Mon, Mar 07, 2016 at 03:31:34PM -0800, Philip Chung wrote:> I've been looking at the project-ideas list and I'm interested in making > Omega use libraries instead of external programs. > > Right now I'm trying to get Olly's patch that was linked there to apply > to the current master. From that point I would see if I could generalize > this to other types of extraction. > > I was thinking of different executables for each type of extraction. Is > this a good way to go, or is there a better way to go about it?Hi, Philip. At the moment we use different executables for each type; and we'll want to continue doing so. The project is more about using libraries in preference, so we don't have to invoke an external program for common file formats -- which should improve indexing speed. I'm not sure how you propose generalising use of a library for extraction; how would a user configure omindex to know how to call the relevant library functions? J -- James Aylett, occasional trouble-maker xapian.org
On 03/09/2016 09:06 AM, James Aylett wrote:> I'm not sure how you propose generalising use of a library for > extraction; how would a user configure omindex to know how to call the > relevant library functions?Sorry, I think I didn't make myself clear. From what I can gather, Olly's patch introduces a new executable "omindex_wv" that is responsible for the processing. The justification was that the conversion happens in a subprocess to shield Omega from any crashes. I was thinking of generalizing this addition to other types of "worker" processes. The question was: Should we introduce more executables like "omindex_wv", like say, "omindex_poppler", "omindex_wps", etc., for each type of conversion? Now that I think about it, I'm not sure if this has any advantage over the current system. Or am I just misunderstanding? Philip