Hi all. A basic version of omindex in python that works (atleast the limited amount that I tested). Standard caveats apply. Please let me know if this proves useful to you, and any problems/improvements. This needs BeautifulSoup for the html parsing. I am sure there are better/faster alternatives (ElementTree??), but I have not really tried them out. Working: - html parsing/indexing. - text parsing and indexing. - pdf - has basic support to be extended, even for scriptindex kind of extension (read code for how). There will likely be edge conditions that dont work - that I have not tested but basic indexing matches omindex. To run it: 1) xapian_omindex.py compare db1 db2 - compares two dbs (for testing) to see where they differ. 2) xapian_omindex.py omindex <omindex options> - generates the index as omindex. - NOT supported: -M option. - support for a subdirectory (ie. omindex --db x --url y dir1 subdir2) Srijon.
On Thu, May 21, 2009 at 07:49:24PM +0100, Srijon Biswas wrote:> A basic version of omindex in python that works (atleast the limited amount > that I tested). Standard caveats apply.Srijon--the list won't accept many attachments, so you're better off putting this up as a tgz somewhere, or on something like github if you've got the license figured out. J -- James Aylett talktorex.co.uk - xapian.org - uncertaintydivision.org