oscaruser@programmer.net
2006-Aug-07 04:47 UTC
[Xapian-discuss] Search with symbols causes search time to hemorrhage
Folks, Searching for terms like with non-alpha numerical symbols causes great delays before search results appears. I am searching 5 M pages (~76 GB) of shopping site web data for things like "Men's Levi's Low Rise Boot Cut 527 Jeans - Downtown", which has symbols " ' ", "-". The xapian DB is on a fast SCSI RAID 0, dual Xeon configuration, but still I see long search times e.g. "Search took 166.606332 seconds". If I remove these symbols and replace them with space, the search times are good (subsecond). However if there are any weird symbols in the search string, then it takes a very long time. Is there anything that I can do about this, so that I still am searching using the special symbols, but the result time is reduced? Thanks, -OSC -- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/
James Aylett
2006-Aug-07 09:22 UTC
[Xapian-discuss] Search with symbols causes search time to hemorrhage
On Sun, Aug 06, 2006 at 07:46:34PM -0800, oscaruser@programmer.net wrote:> Searching for terms like with non-alpha numerical symbols causes > great delays before search results appears. I am searching 5 M pages > (~76 GB) of shopping site web data for things like "Men's Levi's Low > Rise Boot Cut 527 Jeans - Downtown", which has symbols " ' ", > "-". The xapian DB is on a fast SCSI RAID 0, dual Xeon > configuration, but still I see long search times e.g. "Search took > 166.606332 seconds". If I remove these symbols and replace them with > space, the search times are good (subsecond). However if there are > any weird symbols in the search string, then it takes a very long > time. Is there anything that I can do about this, so that I still am > searching using the special symbols, but the result time is reduced?Can you modify your omega template so that it spits out what the parsed query is both with and without the special characters? James -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
oscaruser@programmer.net
2006-Aug-07 22:21 UTC
[Xapian-discuss] Search with symbols causes search time to hemorrhage
Interesting ... looks like the first form of the query was subdivided into "PHRASE 2" based on the symbols, whereas the second form was strictly an ORing of search keywords. The second form is what I want to acheive, but with the special symbols. Should I enclose every part in quote marks to achieve this? Thanks, OSC Query : Men's Levi's Low Rise Boot Cut 527 Jeans - Downtown Query Description : Xapian::Query(((Rmen:(pos=1) PHRASE 2 s:(pos=2)) OR (Rlevi:(pos=3) PHRASE 2 s:(pos=4)) OR Rlow:(pos=5) OR Rrise:(pos=6) OR Rboot:(pos=7) OR Rcut:(pos=8) OR 527:(pos=9) OR Rjeans:(pos=10) OR Rdowntown:(pos=11))) Query : Men s Levi s Low Rise Boot Cut 527 Jeans Downtown Query Description : Xapian::Query((Rmen:(pos=1) OR s:(pos=2) OR Rlevi:(pos=3) OR s:(pos=4) OR Rlow:(pos=5) OR Rrise:(pos=6) OR Rboot:(pos=7) OR Rcut:(pos=8) OR 527:(pos=9) OR Rjeans:(pos=10) OR Rdowntown:(pos=11)))> ----- Original Message ----- > From: "James Aylett" <james-xapian@tartarus.org> > To: xapian-discuss@lists.xapian.org > Subject: Re: [Xapian-discuss] Search with symbols causes search time to hemorrhage > Date: Mon, 7 Aug 2006 09:22:17 +0100 > > > On Sun, Aug 06, 2006 at 07:46:34PM -0800, oscaruser@programmer.net wrote: > > > Searching for terms like with non-alpha numerical symbols causes > > great delays before search results appears. I am searching 5 M pages > > (~76 GB) of shopping site web data for things like "Men's Levi's Low > > Rise Boot Cut 527 Jeans - Downtown", which has symbols " ' ", > > "-". The xapian DB is on a fast SCSI RAID 0, dual Xeon > > configuration, but still I see long search times e.g. "Search took > > 166.606332 seconds". If I remove these symbols and replace them with > > space, the search times are good (subsecond). However if there are > > any weird symbols in the search string, then it takes a very long > > time. Is there anything that I can do about this, so that I still am > > searching using the special symbols, but the result time is reduced? > > Can you modify your omega template so that it spits out what the > parsed query is both with and without the special characters? > > James > > -- > /--------------------------------------------------------------------------\ > James Aylett xapian.org > james@tartarus.org uncertaintydivision.org > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss@lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss>-- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/
Olly Betts
2006-Aug-25 15:48 UTC
[Xapian-discuss] Search with symbols causes search time to hemorrhage
On Sun, Aug 06, 2006 at 07:46:34PM -0800, oscaruser@programmer.net wrote:> Searching for terms like with non-alpha numerical symbols causes great > delays before search results appears. I am searching 5 M pages (~76 > GB) of shopping site web data for things like "Men's Levi's > Low Rise Boot Cut 527 Jeans - Downtown", which has symbols " ' ", "-".Currently << Men's >> is indexed as << Men >> followed by << s >>, and at query time we generate a phrase query. This isn't ideal since as you've noticed this sometimes gives a very slow search. This is bug#22: http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=22 It's likely I'll be looking at it in the near future. Just for the record, the hyphen is irrelevant in this case. If you'd written << Jeans-Downtown >> you'd get a phrase search, but not with whitespace around the hyphen. Cheers, Olly