Hi I seem to like xapian/omega a lot, so that my company will use it internally. I didn't use any binding (perl,php) yet, and did most additional logic in simple php wrappers, calling to the cgi and displaying the highlighted result iframe'd from the query template. Also some cool layout as in http://www2.lib.ncsu.edu/catalog/?view=brief&Ntt=civil+war&Ntk=Keyword&N=0&Nty=1 Problems: * omindex support for zip,rar,outlook msg and excel xls. I hacked a preliminary custom filter for xls and msg into the omindex.cc, http://www.fileformat.info/format/outlookmsg/ And added zip/rar support by decrompessing into a root+"/tmp/"+file dir, indexing there and removing the root+"/tmp/"+file afterwards. Is this a good idea? Or should I prefer hacking scriptindex which I will need sooner or later to support meta fields. BTW: I'll provide official cygwin packages soon. cygwin has much better external filter support than mingw or msvc. They are already at my setup site for some time, but not officially ITP'd yet. -- Reini Urban http://phpwiki.org/ http://murbreak.at/ http://spacemovie.mur.at/ http://helsinki.at/
On Mon, Aug 07, 2006 at 03:46:08PM +0200, Reini Urban wrote:> Problems: > * omindex support for zip,rar,outlook msg and excel xls. > > I hacked a preliminary custom filter for xls and msg into the omindex.cc, > http://www.fileformat.info/format/outlookmsg/ > And added zip/rar support by decrompessing into a root+"/tmp/"+file dir, > indexing there and removing the root+"/tmp/"+file afterwards. > > Is this a good idea? > Or should I prefer hacking scriptindex which I will need sooner or > later to support meta fields.You won't want to hack scriptindex for this, you'll want to change the way you generate the scriptindex data files. However the plan (ages ago) for omindex was to allow you to specify MIME -> generator mappings, which still isn't a bad idea. More recently there has been some discussion about whether the generator mechanism should perhaps be related to the way scriptindex works, to save some code and provide flexibility beyond what we currently provide for, say, PDF. Unfortunately this means nothing has actually been done. If you take the plunge now to scriptindex, you'll probably make your life easier in the short to medium term, and considerably easier in the long term. If you then want to provide indexing scripts, or recipes for indexing from archives etc., then shove them up on the wiki and they'll be available for everyone (and may make their way into the official manual, if you give permission, assuming that at some point I or someone else gets round to writing one :-). You won't have to modify scriptindex at all unless you're doing something pretty unusual. We can't accept code to support RAR into any of the core Xapian packages, because of patent restrictions. (At least, that's my understanding; IANAL.) James -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
On Mon, Aug 07, 2006 at 03:46:08PM +0200, Reini Urban wrote:> BTW: I'll provide official cygwin packages soon. cygwin has much > better external filter support than mingw or msvc. > They are already at my setup site for some time, but not officially ITP'd > yet.Let me know when they're officially available and I'll add a link to the download page. Cheers, Olly
On 8/25/06, Reini Urban <rurban@x-ray.at> wrote:> I did further work on omega, esp. configure.ac and the optional > libtextcat integration, but it's not ready yet. > textcat crashes. >IIRC I had to limit the amount of text fed into textcat to avoid crashes. Maybe you are running into the same issue. If you want to have a look at an example, my code is here : http://svn.berlios.de/wsvn/pinot/trunk/Index/LanguageDetector.cpp?op=file&rev=0&sc=0 Fabrice