Per Jessen
2010-Apr-16 09:18 UTC
[Xapian-discuss] best practices - combining sql database and xapian, size of database?
Newbie-alert: I'm just getting started on a new project involving a full text search requirement, and my initial investigation points to xapian being the way to go. Two questions: - eventually I'll most likely be indexing towards 50 million documents - is this reasonable to expect or attempt with xapian? - each of my documents come with a set of attributes. These are easily stored and indexed in a sql database, but I'm not quite sure how I would combine a sql database lookup with a xapian query? AFAICT, xapian also has mechanism for associating attributes with a document, might that be the right approach? thanks /Per Jessen, Z?rich
Peter Karman
2010-Apr-16 13:05 UTC
[Xapian-discuss] best practices - combining sql database and xapian, size of database?
Per Jessen wrote on 04/16/2010 04:18 AM:> Newbie-alert: I'm just getting started on a new project involving a > full text search requirement, and my initial investigation points to > xapian being the way to go. > > Two questions: > > - eventually I'll most likely be indexing towards 50 million > documents - is this reasonable to expect or attempt with xapian? >yes.> - each of my documents come with a set of attributes. These are easily > stored and indexed in a sql database, but I'm not quite sure how I > would combine a sql database lookup with a xapian query? AFAICT, > xapian also has mechanism for associating attributes with a document, > might that be the right approach?I typically store attributes I want to be able to sort on or collapse on as a Xapian value[0]. Values are not what you search for, but are attributes associated with a document that you can sort by, fetch, etc. I usually store my db primary key as a term[1] because I know it is unique and I want to be able to search for it. If you want one example of prior art that implements the above, you can look at the swish_xapian code[2] (part of Swish3). The assumption in that code is that you have serialized each db record into a XML doc (which allows for joins, etc), and created a config file that calls out each field/column as a MetaName and/or PropertyName. MetaNames are terms in a context (in a field) so you can limit a search to a specific field. PropertyNames are stored values. A field can be both (as with a date, for example). [0] http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html#f7babb1a6368b95dd327f60b433016ac [1] http://xapian.org/docs/apidoc/html/classXapian_1_1Document.html#28eb5f092a2efc25969f5c64b019c79c [2] http://dev.swish-e.org/browser/libswish3/trunk/src/xapian/swish_xapian.cpp -- Peter Karman . http://peknet.com/ . peter at peknet.com
Olly Betts
2010-Apr-29 05:37 UTC
[Xapian-discuss] best practices - combining sql database and xapian, size of database?
On Fri, Apr 16, 2010 at 11:18:00AM +0200, Per Jessen wrote:> - each of my documents come with a set of attributes. These are easily > stored and indexed in a sql database, but I'm not quite sure how I > would combine a sql database lookup with a xapian query? AFAICT, > xapian also has mechanism for associating attributes with a document, > might that be the right approach?If you use Xapian 1.1.x (or shortly 1.2.x) another option is to dynamically combine the results of an SQL query and a Xapian query by subclassing Xapian::PostingSource: http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst Cheers, Olly