Oliver Flimm
2008-Aug-01 09:48 UTC
[Xapian-discuss] FLAG_WILDCARD, add_database and performance
Hi, I recently started to combine several (around 140) seperate databases for a single search request with add_database. I use the xapian perl bindings. Additionally I use a match decider to implement facets. Everything works fine unless I use a wildcard in my search, eg. 'java program*. The enquire-object looks like this: my $enq = $dbh->enquire($qp->parse_query($querystring,Search::Xapian::FLAG_WILDCARD|Search::Xapian::FLAG_LOVEHATE|Search::Xapian::FLAG_BOOLEAN)); Using a wildcard in a sequential search results in search times around 0.00x to 0.x seconds for each database, but the same search request using a combined database handle takes around 200 seconds... You can test it on our public test system in the simple search form: http://kug5.ub.uni-koeln.de/portal/opac?view=kug Is there a way to improve request times for the combined search using wildcards? Regards, Oliver -- Universitaet zu Koeln :: Universitaets- und Stadtbibliothek IT-Dienste :: Abteilung Universitaetsgesamtkatalog Universitaetsstr. 33 :: D-50931 Koeln Tel.: +49 221 470-3330 :: Fax: +49 221 470-5166 flimm at ub.uni-koeln.de :: www.ub.uni-koeln.de
Olly Betts
2008-Aug-04 01:01 UTC
[Xapian-discuss] FLAG_WILDCARD, add_database and performance
On Fri, Aug 01, 2008 at 11:48:57AM +0200, Oliver Flimm wrote:> I recently started to combine several (around 140) seperate databases > for a single search request with add_database. I use the xapian perl > bindings. Additionally I use a match decider to implement facets.Xapian version? Platform?> Using a wildcard in a sequential search results in search times around > 0.00x to 0.x seconds for each database, but the same search request > using a combined database handle takes around 200 seconds...A more comparable test would be against the 140 databases merged into one. But it sounds like something is O(n*n) in the number of databases - that shouldn't be necessary that I can see. If it's easy to test, see if 100 databases takes about 100 seconds, and 70 about 50 seconds.> Is there a way to improve request times for the combined search using > wildcards?Could you profile to find where the time is spent? Some tips are here: http://trac.xapian.org/wiki/ProfilingXapian Cheers, Olly