David Levy
2006-Feb-14 23:03 UTC
[Xapian-discuss] Optimization and Load balancing with Xapian
Hi (that's me again !! sorry :)) I am experiencing bad response times with Xapian/Omega in the last few days. My database has more than 700k records, using ~ 3Go disk space. Maybe my requests or my templates are not optimized, or maybe it's a hardware (disk speed) issue. The weird thing is that often, the search time provided in the response is sub second, and the response is actually given by Omega over one second (even seconds ...). Did one of you had this kind of issues ? Maybe my Apache (1.x) is not well configured for Omega ? FYI, I am using PHP5 (so bindings are not available) and calling Omega with HTTP GET requests returning XML documents between two servers. To solve this issue, I was been thinking about load balancing Xapian. I could not find any information about that on Internet. One of you did it yet ? How ? I've found Crossroads (http://public.e-tunity.com/crossroads/crossroads.html) tonight, i think it could help : "*Crossroads is a load balance and fail over utility for TCP based services. It is a daemon program running in user space, and features extensive configurability, polling of back ends using 'wakeup calls', detailed status reporting, 'hooks' for special actions when backend calls fail, and much more. Crossroads is service-independent: it is usable for HTTP(S), SSH, SMTP, DNS, etc." *Thanks in advance ! -- David LEVY {selenium} Website ~ http://www.davidlevy.org Wishlist Zlio ~ http://david.zlio.com/wishlist Blog ~ http://selenium.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060215/b9ca81c6/attachment.htm
David Levy
2006-Feb-15 13:29 UTC
[Xapian-discuss] Re: Optimization and Load balancing with Xapian
I've done some tests this morning and it seems that some of this slowlyness is due to sorting. Indeed, Omega requests with and with sorting do not produce the same calculation time at all. < 1 s without sorting and sometimes > 30 s with sorting.... These 30 seconds happen with results having like 500+ matches. How can it be possible ? Sorting should not be so much time consuming I guess. I think I am missing something in Omega cgi parameters, Omega template, or scriptindex configuration. Can you help please :) ?? For information, the field I use for sorting is a numeric field containing integers between ~ 1 and 100. Regards David On 2/15/06, David Levy <dvid.levy@gmail.com> wrote:> > Hi (that's me again !! sorry :)) > > I am experiencing bad response times with Xapian/Omega in the last few > days. My database has more than 700k records, using ~ 3Go disk space. > Maybe my requests or my templates are not optimized, or maybe it's a > hardware (disk speed) issue. The weird thing is that often, the search time > provided in the response is sub second, and the response is actually given > by Omega over one second (even seconds ...). Did one of you had this kind of > issues ? Maybe my Apache (1.x) is not well configured for Omega ? > FYI, I am using PHP5 (so bindings are not available) and calling Omega > with HTTP GET requests returning XML documents between two servers. > > To solve this issue, I was been thinking about load balancing Xapian. I > could not find any information about that on Internet. One of you did it yet > ? How ? > I've found Crossroads ( > http://public.e-tunity.com/crossroads/crossroads.html) tonight, i think it > could help : > "*Crossroads is a load balance and fail over utility for TCP based > services. It is a daemon program running in user space, and features > extensive configurability, polling of back ends using 'wakeup calls', > detailed status reporting, 'hooks' for special actions when backend calls > fail, and much more. Crossroads is service-independent: it is usable for > HTTP(S), SSH, SMTP, DNS, etc." > > *Thanks in advance ! > > -- > David LEVY {selenium} > Website ~ http://www.davidlevy.org > Wishlist Zlio ~ http://david.zlio.com/wishlist > Blog ~ http://selenium.blogspot.com >-- David LEVY {selenium} Website ~ http://www.davidlevy.org Wishlist Zlio ~ http://david.zlio.com/wishlist Blog ~ http://selenium.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060215/859be7ec/attachment.htm
Olly Betts
2006-Feb-16 15:10 UTC
[Xapian-discuss] Optimization and Load balancing with Xapian
On Wed, Feb 15, 2006 at 01:03:09AM +0200, David Levy wrote:> I am experiencing bad response times with Xapian/Omega in the last few days. > My database has more than 700k records, using ~ 3Go disk space. > Maybe my requests or my templates are not optimized, or maybe it's a > hardware (disk speed) issue. The weird thing is that often, the search time > provided in the response is sub second, and the response is actually given > by Omega over one second (even seconds ...).The time reported by "$time" includes the match, but because of how Omega works it doesn't include the time to calculate top terms (if you're using $topterms), and also doesn't include the time to display the matches. If you're actually displaying a lot of matches that can be quite considerable. So one thing to check is that $topterms isn't being used.> To solve this issue, I was been thinking about load balancing Xapian. I > could not find any information about that on Internet. One of you did it yet > ? How ?I've not done it myself. The simple approach is just to put several boxes in the DNS and they'll be used in a round-robin fashion.> I've done some tests this morning and it seems that some of this slowlyness > is due to sorting. > > Indeed, Omega requests with and with sorting do not produce the same > calculation time at all. < 1 s without sorting and sometimes > 30 s with > sorting.... These 30 seconds happen with results having like 500+ matches. > How can it be possible ? Sorting should not be so much time consuming I > guess.It's not the actual sorting which takes the extra time - the issue is that for a multi-term query, relevance ranking can terminate early in many cases (often when we reach the end of the matches for any of the terms). But if results are sorted on a value, we need to consider every result which matches the query. Cheers, Olly