On 5/18/14 10:24 AM, Aarsh Shah wrote:> Hello,
>
> I have added an entry to my journal containing a link to a sample XML
> file and my ideas on how to index the entire imdb movie database.:-
>
>
http://trac.xapian.org/wiki/GSoC2014/Performance%20and%20Optimisation/Journal
>
> Please do let me know what you think.
>
I'm not sure if you're looking for input from non-mentors, so I
apologize in advance if this reply is inappropriate.
The plan looks sane to me.
I offer this an example of prior art along the same lines:
https://github.com/karpet/libswish3/blob/master/src/xapian/swish_xapian.cpp
It uses the libxml2 library (the same library Python's lxml bindings
use) to parse XML files and create Xapian indexes. Perhaps it may aid as
a baseline for your own efforts.
--
Peter Karman . http://peknet.com/ . peter at peknet.com