thr3ads.net - Xapian discuss - [Xapian-discuss] Q prefix / Unique ID not being found [Mar 2005]

If this information is useful, please help other people find it:
Share via:

Sig Lange

2005-Mar-11 15:40 UTC

[Xapian-discuss] Q prefix / Unique ID not being found

I'm creating a unique ID for every document, I have about 3500
documents so far and seem to have ran into a problem while testing.
Here's what I did to "discover" my issue.

The first term I add to a document is in the form of (python):
uid = sha.new( str(random.random()) + str(time.time()) ).hexdigest()
"Q" + uid
which is basically a random float + unix timestamp as float. I used
.add_term() for this.

I can ensure that every key is unique and actually being added to the document.
So I came up with some code like this to list all terms
-- listterms.py --
iter = xapdb.allterms_begin()
end = xapdb.allterms_end()
while not iter == end:
        print iter.get_term()
        iter.next()
-- listterms.py --

making a little bash loop like this I then requested each document off
my server:

./misc/listterms.py  | grep ^Q | cut -c2- | while read id; do curl -s
"http://localhost:8080/bin/read?id=$id" | grep -n ^ERROR; done

-- the /bin/read (hacked down for brevity)--
sys.stderr = sys.stdout
FieldStore = cgi.FieldStorage()
print "Content-Type: text/html"
print
xapdb = xapian.Database("..")

enquire = xapian.Enquire(xapdb)
stemmer = xapian.Stem("english")

qp = xapian.QueryParser()
# i do have other prefixes but only Q is important to my example
qp.set_prefix("id", "Q")

id = FieldStore.getvalue("id", "")
q = "id:" + id

query = qp.parse_query(q)

enquire.set_query(query)
matches = enquire.get_mset(0, 1)

if  matches.get_matches_upper_bound() == 0:
   print "ERROR: Oops, unable to find a message %s" % (id)
   sys.exit(0)

match = iter(matches).next()

print "ID %i %i%% [%s]" % \
(match[xapian.MSET_DID], match[xapian.MSET_PERCENT],
match[xapian.MSET_DOCUMENT].get_data())

-- the /bin/read --

I tried calling flush thinking it could be a python side effect. tried
doing every 50 documents, after every record, and not at all. Ended up
with pretty much the same results. There is about 200 unique ID's that
are not found. How can this be?

TIA
Sig

James Aylett

2005-Mar-11 16:36 UTC

head link

[Xapian-discuss] Q prefix / Unique ID not being found

On Fri, Mar 11, 2005 at 10:39:53AM -0500, Sig Lange wrote:
> -- the /bin/read (hacked down for brevity)--
> sys.stderr = sys.stdout
> FieldStore = cgi.FieldStorage()
> print "Content-Type: text/html"
> print
> xapdb = xapian.Database("..")
> 
> enquire = xapian.Enquire(xapdb)
> stemmer = xapian.Stem("english")
> 
> qp = xapian.QueryParser()
> # i do have other prefixes but only Q is important to my example
> qp.set_prefix("id", "Q")
> 
> id = FieldStore.getvalue("id", "")
> q = "id:" + id
> 
> query = qp.parse_query(q)
> 
> enquire.set_query(query)
> matches = enquire.get_mset(0, 1)
> 
> if  matches.get_matches_upper_bound() == 0:
>    print "ERROR: Oops, unable to find a message %s" % (id)
>    sys.exit(0)
> 
> match = iter(matches).next()
> 
> print "ID %i %i%% [%s]" % \
> (match[xapian.MSET_DID], match[xapian.MSET_PERCENT],
> match[xapian.MSET_DOCUMENT].get_data())
> 
> -- the /bin/read --
How about, instead of your QueryParser bit ...:

----------------------------------------------------------------------
id = FieldStore.getvalue("id", "")
query = xapian.Query(xapian.Query.OP_OR, ["Q%s" % str(id)])
enquire.set_query(query)
----------------------------------------------------------------------

since you really don't need to bother about the query parser for this
kind of work.

I'm pretty sure that if a document never got into the database, its
Q-term won't have done either. So if you're getting the id terms to
look for out of the Xapian database, you should be able to find the
documents as well.

You could try getting the posting list from the term, and print the
docid for each Q-term you find, then interrogate the database directly
for those documents, if you're worried there's an inconsistency
between the different tables.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james@tartarus.org                               uncertaintydivision.org

Xapian discuss - Mar 2005 - Q prefix / Unique ID not being found

[Xapian-discuss] Q prefix / Unique ID not being found

[Xapian-discuss] Q prefix / Unique ID not being found