thr3ads.net - Xapian discuss - [Xapian-discuss] Omindex python

If this information is useful, please help other people find it:
Share via:

Srijon Biswas

2009-May-21 18:49 UTC

[Xapian-discuss] Omindex python - first version.

Hi all.

A basic version of omindex in python that works (atleast the limited amount
that I tested). Standard caveats apply.
Please let me know if this proves useful to you, and any
problems/improvements.

This needs BeautifulSoup for the html parsing. I am sure there are
better/faster alternatives (ElementTree??),  but I have not really tried
them out.

Working:
 - html parsing/indexing.
 - text parsing and indexing.
 - pdf
 - has basic support to be extended, even for scriptindex kind of extension
(read code for how).

There will likely be edge conditions that dont work - that I have not tested
but basic indexing matches omindex.

To run it:
1) xapian_omindex.py compare db1 db2
 - compares two dbs (for testing) to see where they differ.

2) xapian_omindex.py omindex <omindex options>
 - generates the index as omindex.
 - NOT supported:
   -M option.
   - support for a subdirectory (ie. omindex --db x --url y dir1 subdir2)

Srijon.

James Aylett

2009-May-21 21:48 UTC

head link

[Xapian-discuss] Omindex python - first version.

On Thu, May 21, 2009 at 07:49:24PM +0100, Srijon Biswas wrote:
> A basic version of omindex in python that works (atleast the limited amount
> that I tested). Standard caveats apply.
Srijon--the list won't accept many attachments, so you're better off
putting this up as a tgz somewhere, or on something like github if
you've got the license figured out.

J

-- 
  James Aylett

  talktorex.co.uk - xapian.org - uncertaintydivision.org

Xapian discuss - May 2009 - Omindex python - first version.

[Xapian-discuss] Omindex python - first version.

[Xapian-discuss] Omindex python - first version.