Fernando Nemec
2006-Nov-10 18:37 UTC
[Xapian-discuss] Problems with positions and replace_document
Hi all, I'm currently testing Xapian so I can see if the library fits my needs. While testing I got a small problem. Suppose my database has a document A with two terms "foo" and "bar". "foo" has position 1 and "bar" has position 2. Doing a phrase search for "foo bar", A get matched. So far, so good. Then I apply document A to this process (suppose 1 is A's doc_id): // get document from db add a value put it back Xapian::Document doc = db.get_document( 1 ) ; doc.add_value( 0 , "fnord"); db.replace_document( 1 , doc ) ; Well, I'm expecting to have the very same document, but with a new value. Unfortunately, my expectations were wrong. If I try to do the very same phrase search for "foo bar" document A doesn't get matched. Is this behavior by design, consequence of a bug or I'm messing up something? By the way, I wrote a small code to reproduce this behavior. If needed, I can send by request. Thanks, Nemec -- []s Fernando Nemec fernando.nemec@folha.com.br
Olly Betts
2006-Nov-10 19:06 UTC
[Xapian-discuss] Problems with positions and replace_document
On Fri, Nov 10, 2006 at 04:36:31PM -0200, Fernando Nemec wrote:> Well, I'm expecting to have the very same document, but with a new > value. Unfortunately, my expectations were wrong.That sounds very much like a bug. If you want to find out exactly what's happened, you can use "delve" to examine the document, its terms, and any positional information.> By the way, I wrote a small code to reproduce this behavior. If > needed, I can send by request.That would be very useful. Cheers, Olly
Fernando Nemec
2006-Nov-10 19:10 UTC
[Xapian-discuss] Problems with positions and replace_document
Hi Rafael, thanks for your reply. Here you go. I put my code attached to this message. By the way, I'm using gcc 4.1.1 on Fedora core 4. The flags I'm using are on the top of program, as well as the path to the db. Also, I'm using compiled xapian-core 0.9.6 and flint. Thanks, Nemec Friday, November 10, 2006, 4:54:28 PM, you wrote:> On 11/10/06, Fernando Nemec <fernando.nemec@folha.com.br> wrote: >> >> Hi all, >> >> I'm currently testing Xapian so I can see if the library fits my >> needs. >> >> While testing I got a small problem. >> >> Suppose my database has a document A with two terms "foo" and "bar". >> "foo" has position 1 and "bar" has position 2. >> >> Doing a phrase search for "foo bar", A get matched. So far, so good. >> >> Then I apply document A to this process (suppose 1 is A's doc_id): >> >> // get document from db add a value put it back >> Xapian::Document doc = db.get_document( 1 ) ; >> doc.add_value( 0 , "fnord"); >> db.replace_document( 1 , doc ) ; >> >> Well, I'm expecting to have the very same document, but with a new >> value. Unfortunately, my expectations were wrong. If I try to do the >> very same phrase search for "foo bar" document A doesn't get matched. >> >> Is this behavior by design, consequence of a bug or I'm messing up >> something? >> >> By the way, I wrote a small code to reproduce this behavior. If >> needed, I can send by request. >> >> Thanks, >> >> Nemec> Hi Fernando, you can send attachments to this list, send to us see the code. > I'll try reproduce the problem using python (more simple ;)> Thanks.-- []s Fernando Nemec fernando.nemec@folha.com.br http://www.folha.com.br/
Fernando Nemec
2006-Nov-10 20:34 UTC
[Xapian-discuss] Problems with positions and replace_document
Hi Rafael, thanks for you help. That's strange because I did almost the same thing using c++ but I got different results. I sent to the list a link with the code I did. I appreciate if you take a look. As your python code works it's likely I mess up something in my own code. Thanks, Nemec Friday, November 10, 2006, 5:07:13 PM, you wrote:> Fernando, look this fast test:> sdm@sdm-desktop:~/db$ python > Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02) > [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import xapian >>>> db = xapian.WritableDatabase('.', xapian.DB_CREATE_OR_OPEN) >>>> doc = xapian.Document >>>> doc = xapian.Document() >>>> doc.add_posting('foo', 1) >>>> doc.add_posting('bar', 2) >>>> db.add_document(doc) > 1 >>>>> EXIT PYTHON> sdm@sdm-desktop:~/db$ python > Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02) > [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import xapian >>>> a = xapian.open('.') >>>> db = xapian.open('.') >>>> db.get_doccount() > 1 >>>> db.get_document(1) >>>> doc = db.get_document(1) >>>> doc.add_value(0,"fnord") >>>> db = xapian.WritableDatabase('.', xapian.DB_CREATE_OR_OPEN) >>>> db.replace_document(1, doc) >>>>> EXIT PYTHON AGAIN> sdm@sdm-desktop:~/db$ python > Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02) > [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import xapian >>>> db = xapian.open('.') >>>> doc = db.get_document(1) >>>> for i in doc.termlist(): > ... print i > ... > ['bar', 1, 0, <xapian.PositionIter instance at 0xb7b67b8c>] > ['foo', 1, 0, <xapian.PositionIter instance at 0xb7b67c2c>] >>>> enq = xapian.Enquire(db) >>>> qp = xapian.QueryParser() >>>> enq.set_query(qp.parse_query('"foo bar"')) >>>> m = enq.get_mset(0,1) >>>> for i in m: > ... print i > ... > [1, 0.30830135965451672, 0, 100, <xapian.Document; proxy of <Swig Object of > type 'Xapian::Document *' at 0x81a7118> >]> As you can see, the Xapian overwrote the document keeping the postlist > terms, and the search for "foo bar" (Phrase search) returns the right > document.> I'm using Ubuntu 6.10, python2.4 xapian-0.9.6 (just to test, 0.9.9 is the > better release xD -remote backend historic-)-- []s Fernando Nemec fernando.nemec@folha.com.br http://www.folha.com.br/
Fernando Nemec
2006-Nov-10 20:47 UTC
[Xapian-discuss] Problems with positions and replace_document
Hi Rafael, Very strange. I got the same result if I index just one document: fnemec@repulse:~/spiffy_search# ./test -i 1 fnemec@repulse:~/spiffy_search# ./test -s Xapian::Query((brown PHRASE 2 fox)) 1 fnemec@repulse:~/spiffy_search# ./test -r done fnemec@repulse:~/spiffy_search# ./test -s Xapian::Query((brown PHRASE 2 fox)) 1 When I try with two documents in db, however: fnemec@repulse:~/spiffy_search# ./test -i 1 fnemec@repulse:~/spiffy_search# ./test -i 2 fnemec@repulse:~/spiffy_search# ./test -s Xapian::Query((brown PHRASE 2 fox)) 1 2 fnemec@repulse:~/spiffy_search# ./test -r done fnemec@repulse:~/spiffy_search# ./test -s Xapian::Query((brown PHRASE 2 fox)) 2 Can you check if you have the same results? Nemec Friday, November 10, 2006, 6:34:21 PM, you wrote:> On 11/10/06, Fernando Nemec <fernando.nemec@folha.com.br> wrote: >> >> Olly, >> >> > If not, the list is set to filter some binary attachments I think (as >> > this rejects a lot of spam). Try making sure it's attached as text. >> > Or perhaps it's simplest to open a bug report and attach it to that. >> >> yes, the list is filtering attachments. No problem. I publish my code >> at this address: >> >> http://mxzypkt.folha.com.br/2006/11/10/xapian/replace.cpp> Look my shell:> sdm@sdm-desktop:~/Desktop$ ./test -i > 1 > sdm@sdm-desktop:~/Desktop$ ./test -s > Xapian::Query((brown PHRASE 2 fox)) > 1 > sdm@sdm-desktop:~/Desktop$ ./test -r > done > sdm@sdm-desktop:~/Desktop$ ./test -s > Xapian::Query((brown PHRASE 2 fox)) > 1 > sdm@sdm-desktop:~/Desktop$> looks fine to me, nothing wrong, to compile I run the command in the > comments of the code (and changed the database dir to work here ;)-- []s Fernando Nemec fernando.nemec@folha.com.br http://www.folha.com.br/