Splitting terms into separate B-Trees. I would like to have a feature in Xapian that would allow me to tell Xapian to split B-Tree terms into 2 or more B-Trees that I could place on different servers and then based on what term Xapian is searching master server will issue search to particular worker server that holds the term. Master server will received info from worker server which documents IDs contain the term then master server retrieves document from local database and return search result to client. Example: Master holds database with documents Server0 - B-tree with bitmaps for all terms starting a-m are stored on ServerA - B-tree with bitmaps for all terms starting n-z are stored on ServerB If I am looking for term: xapian, Master Server0 will issue search to ServerB, result is then send back Master Server0 then Master Server0 retrieves documents details from local database and will send to client. -- Kevin Duraj http://myhealthcare.com
Alexander Lind
2007-May-22 22:35 UTC
[Xapian-discuss] Splitting terms into separate B-Trees.
Kevin Duraj wrote:> Splitting terms into separate B-Trees. > > I would like to have a feature in Xapian that would allow me to tell > Xapian to split B-Tree terms into 2 or more B-Trees that I could place > on different servers and then based on what term Xapian is searching > master server will issue search to particular worker server that holds > the term. Master server will received info from worker server which > documents IDs contain the term then master server retrieves document > from local database and return search result to client. > > Example: > > Master holds database with documents Server0 > - B-tree with bitmaps for all terms starting a-m are stored on ServerA > - B-tree with bitmaps for all terms starting n-z are stored on ServerB > > If I am looking for term: xapian, Master Server0 will issue search to > ServerB, result is then send back Master Server0 then Master Server0 > retrieves documents details from local database and will send to > client. >Is it really a better approach to achieve load balancing this way, rather than have identical indexes on all worker-machines, and just round-robin out requests to them? Or if you want to get fancy, use some sort of loadmonitor report to decide which machine should carry out the search? Seems like it could get pretty hairy to have the trees split like you suggest. What if you want to add a third server? What if most people happen to be searching on things beginning with [a-m]? Alec