Sun Jianhong-a18232
2007-Dec-07 23:48 UTC
[Xapian-discuss] Xapian or Clucene for mobile device
Hi, All, Now we are investigating a open search engine for mobile device. Both Xapian and Clucene are c++ open search engine. From the performance perspective, which one is better for mobile device? Do you have some performance data between Xapian and Clucene? For mobile device, we do care about RAM comsuption, search and index performance, library size, etc. Can Xapian be easy to tailor for mobile device? Thank you very much ! Regards, Sun Jianhong
On Fri, Oct 19, 2007 at 04:40:27PM +0800, Sun Jianhong-a18232 wrote:> For mobile device, we do care about RAM comsuption, search and index > performance, library size, etc. Can Xapian be easy to tailor for mobile > device?To reduce code size, you can disable the backends you don't want, and compile to favour small code size without debugging symbols - something like: ./configure --disable-backend-inmemory --disable-backend-quartz --disable-backend-remote CXXFLAGS=-Os That's assuming you don't want the inmemory or remote backends. You will only want quartz if you're migrating a system from Xapian < 1.0. We've not assumed huge memory or fast processors - generally techniques which scale well will work well for smaller collections on small machines as well as big collections on bigger machines. I've not done testing on small devices myself, but the OLPC (http://laptop.org/) uses Xapian to search its datastore, and has a fairly modest hardware spec. Feedback on use on smaller devices is certainly welcome. Cheers, Olly
Sun Jianhong, I have done lot of research and performance and quality searches between MySQL 5 Full-Text, MS SQL 2005 Full-Text, Lucene and Xapian. All my performance and quality measurements showed to be Xapian the fastest in indexing and searching. On my surprise I found Lucene to be the slowest search engine and having the poorest quality results, but having the largest community compare to MySQL 5 Full-Text, MS SQL 2005 and Xapian. WHY? (let's investigate closer) Performance: - Lucene uses compound file format by default. Xapian and others used B-Tree by default. Building and searching the compound file format takes more time than building B-Tree. Therefore Lucene, Clucene etc. indexing and searching is many times slower than indexing and searching the same amount of data using Xapian. Quality of searches: - Lecene uses Levenstein distance between two string instead of Xapian using BM25 matching documents according to their relevance to a given search query and returns much better quality of results than Levenstein distance algorithm. I do not want you to get bored with more information and statistics, but you can continue the research in case I missed something, cheers! __________________________________ Kevin Duraj http://UncensoredWebSearch.com On Oct 19, 2007 12:40 AM, Sun Jianhong-a18232 <a18232@motorola.com> wrote:> Hi, All, > > Now we are investigating a open search engine for mobile device. Both > Xapian and Clucene are c++ open search engine. From the performance > perspective, which one is better for mobile device? Do you have some > performance data between Xapian and Clucene? > > For mobile device, we do care about RAM comsuption, search and index > performance, library size, etc. Can Xapian be easy to tailor for mobile > device? > > Thank you very much ! > > Regards, > Sun Jianhong > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss@lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >