Bradley
2008-Nov-21 02:29 UTC
[Xapian-discuss] Multiple databases vs Single large database
Hi I've decided to use xapian because my files table in my mysql database is going to grow very large, and it seems mysql isn't good at full text searching. I'm doing this with the php wrapper by the way. The way my system is set out, each user has their own set of files, and when doing a search it is going to be for a specific user's file (based on file name, title, description). Although at some point we may decide we want functionality to search for files for a list of users or all users. I was planning on having a xapian database for each user's files. Would it be better this way (multiple databases), or to have on large database for all users files, as I'm doing with mysql. I'm thinking mainly with regard to performance, feel free to add other thoughts. Thanks Bradley
Bradley wrote:> Hi > I've decided to use xapian because my files table in my mysql database is going > to grow very large, and it seems mysql isn't good at full text searching. I'm > doing this with the php wrapper by the way. > > The way my system is set out, each user has their own set of files, and when > doing a search it is going to be for a specific user's file (based on file > name, title, description). Although at some point we may decide we want > functionality to search for files for a list of users or all users. > > I was planning on having a xapian database for each user's files. Would it be > better this way (multiple databases), or to have on large database for all > users files, as I'm doing with mysql. I'm thinking mainly with regard to > performance, feel free to add other thoughts. > > Thanks > Bradley > > >If I were doing it, I'd do it your way. Searching a single DB will most likely be faster. Once you allow your users to search multiple DBs you can evaluate performance and see if merging them makes sense. Consider 1. Are the searches fast enough (of multiple DBs)? 2. How often are multiple DBs searched? If you need to merge them, there is a utility, xapian-compact, (http://xapian.org/docs/admin_notes.html#merging-databases) that will do it for you with a minimum of effort. You didn't ask, but here are a few things to consider. 1. Xapian searches will not be looking at realtime data. It takes a finite amount of time to add new entries. The larger the database, the longer it will take to index new entries. 1.1. Be sure to have something in the database that either says "This row has been added to Xapian" or have a field with a last changed timestamp. Periodically add new entries to the Xapian DB by comparing times or select on the "is_added" field. 2. Consider ping ponging two Xapian DBs when updating. I use the following logic. I have two directories with Xapian DBs. A and B. If A is older than B copy contents of B into A else copy contents of A into B add new entries to the copy if the copy is A rm C ln -s A C if the copy is B rm C ln -s B C where C is the database that I am using to search. Jim.