Bill Hendrickson
2011-May-12 18:18 UTC
[Xapian-discuss] Xapian support for huge data sets?
Hello, I?m currently using another open source search engine/indexer and am having performance issues, which brought me to learn about Xapian. We have approximately 350 million docs/10TB data that doubles every 3 years. The data mostly consists of Oracle DB records, webpage-ish files (HTML/XML, etc.) and office-type docs (doc, pdf, etc.). There are anywhere from 2 to 4 dozen users on the system at any one time. The indexing server has upwards of 28GB memory, but even then, it gets extremely taxed, and will only get worse. In the opinion of this list, would Xapian be able to handle this kind of load, or should I evaluate more ?enterprise?-like solutions (GSA, etc.)? Thanks.
Hi Bill, Yes, Xapian can handle such a large indexes very easily if you know what to do. You are in the right place, and you are even in my Xapian index, welcome. :-) http://find1friend.com/search?q=Bill+Hendrickson Cheers, Kevin Thomas Duraj http://myhealthcare.com On Thu, May 12, 2011 at 11:18 AM, Bill Hendrickson <wjhendrickson at gmail.com> wrote:> Hello, > > I?m currently using another open source search engine/indexer and am > having performance issues, which brought me to learn about Xapian. ?We > have approximately 350 million docs/10TB data that doubles every 3 > years. ?The data mostly consists of Oracle DB records, webpage-ish > files (HTML/XML, etc.) and office-type docs (doc, pdf, etc.). ?There > are anywhere from 2 to 4 dozen users on the system at any one time. > The indexing server has upwards of 28GB memory, but even then, it gets > extremely taxed, and will only get worse. > > In the opinion of this list, would Xapian be able to handle this kind > of load, or should I evaluate more ?enterprise?-like solutions (GSA, > etc.)? > > Thanks. > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >
On 12/05/2011 19:18, Bill Hendrickson wrote:> Hello, > > I?m currently using another open source search engine/indexer and am > having performance issues, which brought me to learn about Xapian. We > have approximately 350 million docs/10TB data that doubles every 3 > years. The data mostly consists of Oracle DB records, webpage-ish > files (HTML/XML, etc.) and office-type docs (doc, pdf, etc.). There > are anywhere from 2 to 4 dozen users on the system at any one time. > The indexing server has upwards of 28GB memory, but even then, it gets > extremely taxed, and will only get worse. > > In the opinion of this list, would Xapian be able to handle this kind > of load, or should I evaluate more ?enterprise?-like solutions (GSA, > etc.)?Xapian was originally written to power the Webtop web search engine, which indexed around 500 million pages on a farm of around 30 servers, back in 1999 or so. We've built 100m page indexes for clients. You shouldn't have any trouble indexing your content given sufficient hardware, arranged in the right way - a single server is probably not enough though! Cheers Charlie www.flax.co.uk