Olly Betts
2015-Jan-20 01:43 UTC
[Xapian-discuss] Question on "single writer, multiple reader"
On Sun, Jan 18, 2015 at 04:25:29PM +0000, James Aylett wrote:> That?s exactly how it?s supposed to work. ?Eventually? (once the > writer gets sufficiently far ahead of the reader), the reader will get > a DatabaseModifiedError and will have to re-open the database, but > until then it?s up to it when it does so. You may wish to do it every > N requests, or every K seconds, or only when you have to handle > DatabaseModifiedError; it?s up to you. > > We have a note that some more detailed documentation around this would > be helpful. For now, the following should be useful: > <https://getting-started-with-xapian.readthedocs.org/en/latest/concepts/indexing/databases.html?highlight=databasemodifiederror#concurrent-access>.I've just improved this with a note that reopen() is a cheap no-op when there isn't a newer revision: https://github.com/jaylett/xapian-docsprint/commit/41bb7a1da61d22e0047a83176386da4db1ee9f15 Cheers, Olly
Gang Chen
2015-Jan-22 03:45 UTC
[Xapian-discuss] Question on "single writer, multiple reader"
Hi, J, Olly, Thanks for the replies! I've been using 'reopen()' in my search process, and as expected, the new documents can be retrieved now! As I dived deeper with Xapian, I found another problem using the *slot* feature with "single writer and multiple readers". After several days' trial and error, I think it might be a bug with the Chert backend. So, here is my observation: I used Xapian-1.2.19, and the default Chert backend. I wanted to index some movie meta data, e.g. the title, and the premiere time. I stored them in a document with the 'doc.add_value()' method. newdocument.add_posting("title_0", 1, 1); newdocument.add_posting("time_0", 1, 1); newdocument.add_value(1, "title_0"); // slot1, title_0 newdocument.add_value(2, "time_0"); // slot2, time_0 In the search process, I used 'doc.get_value()' to get the value in slots. for (Xapian::MSetIterator i = matches.begin(); i != matches.end(); ++i) { Xapian::Document doc = i.get_document(); cout << "Document ID " << *i << "\t" << i.get_percent() << "%" << endl; cout << "[" << doc.get_value(1) << "]" << endl; cout << "[" << doc.get_value(2) << "]" << endl; } While search process was alive, I added some more movie data into the database. The first few new ones were fine, but when there were more than 1,000 (or more) documents (committed every 10,000 docs) added to the database, the search process crashed with a seg fault. However, I restarted the search process, and the new documents could be retrieved. Btw, in the search process, 'reopen()' was performed before each query. I tried changing from 'add_value()' and 'get_value()' to 'set_data()' and 'get_data()', the searching and incremental indexing were all successful. The 'data' value and the 'slot' value were both attached to a document, but different behaviors were observed. So I guess there might be something wrong with the slot value? I also tried to explicitly use the *Flint* as the backend. Surprsingly, there was no seg fault, and everything was successful. Could it be something wrong with the slot value processing in the Chert backend? Best wishes, Gang 2015-01-20 9:43 GMT+08:00 Olly Betts <olly at survex.com>:> On Sun, Jan 18, 2015 at 04:25:29PM +0000, James Aylett wrote: > > That?s exactly how it?s supposed to work. ?Eventually? (once the > > writer gets sufficiently far ahead of the reader), the reader will get > > a DatabaseModifiedError and will have to re-open the database, but > > until then it?s up to it when it does so. You may wish to do it every > > N requests, or every K seconds, or only when you have to handle > > DatabaseModifiedError; it?s up to you. > > > > We have a note that some more detailed documentation around this would > > be helpful. For now, the following should be useful: > > < > https://getting-started-with-xapian.readthedocs.org/en/latest/concepts/indexing/databases.html?highlight=databasemodifiederror#concurrent-access > >. > > I've just improved this with a note that reopen() is a cheap no-op when > there isn't a newer revision: > > > https://github.com/jaylett/xapian-docsprint/commit/41bb7a1da61d22e0047a83176386da4db1ee9f15 > > Cheers, > Olly >
Olly Betts
2015-Jan-22 04:56 UTC
[Xapian-discuss] Question on "single writer, multiple reader"
On Thu, Jan 22, 2015 at 11:45:37AM +0800, Gang Chen wrote:> Could it be something wrong with the slot value processing in the > Chert backend?Sounds like it. Can you provide a complete example which reproduces this? Cheers, Olly