John Pye
2006-May-16 08:15 UTC
[Xapian-discuss] quick-and-dirty web search for a bunch of PDFs?
Hi all I'm a newbie with Xapian. I have just a simple goal of creating a quick-and-dirty web-based fulltext search for a bunch of PDF files that I've collected from various conference CDs. Is this a use-case that Omega covers, or do I need to use Xapian directly? Where can I find some documentation about the capabilities of Omega? Assuming Omega doesn't do this, would it be reasonably straightforward to attempt to write something using the python bindings? Has anyone done a HOWTO for this pretty basic use-case? I presume I will need to provide a PDF-to-text filter of some sort, eg poppler/xpdf or similar? I'd really appreciate any suggestions you can giving getting started here. Xapian looks really nice and I'm looking forward to using it. Cheers JP -- John Pye School of Mechanical and Manufacturing Engineering The University of New South Wales Sydney NSW 2052 Australia t +61 2 9385 5127 f +61 2 9663 1222 mailto:john.pye_AT_student_DOT_unsw.edu.au http://pye.dyndns.org/
Olly Betts
2006-May-16 09:31 UTC
[Xapian-discuss] quick-and-dirty web search for a bunch of PDFs?
On Tue, May 16, 2006 at 05:16:55PM +1000, John Pye wrote:> I'm a newbie with Xapian. I have just a simple goal of creating a > quick-and-dirty web-based fulltext search for a bunch of PDF files that > I've collected from various conference CDs. > > Is this a use-case that Omega covers, or do I need to use Xapian > directly?Omega's "omindex" indexer will index PDF files out of the box (just make sure you have pdftotext installed.)> Where can I find some documentation about the capabilities of > Omega?In the docs subdirectory of the omega sources! They get installed in /usr/local/share/doc/omega by default, or probably something like /usr/share/doc/xapian-omega if you installed from a package. Cheers, Olly