Jim Lynch
2012-Jan-15 13:55 UTC
[Xapian-discuss] I'm trying to relate what I know about Omega/Scriptindex with the actual data
James, thanks for the explanations. I misread the notes. As an exercise, I'm trying to convert an existing project that currently uses Scriptindex and Omega to direct Xapian API calls. I did a (I think) complete dump of a document with delve -r 565 -d database and I see things like subject='A typical subject' with a corresponding set of terms like Sa Stypical Ssubject Which is what I expect, however I have two "fields" unixdate and summary which I've specified as unixdate : field date=unix summary : field In my index file. They are displayed in the delve output as summary=Do you remember what was wrong with the bearings? unixdate=1181883741 I don't see a set of terms that would correspond to either of these. Yes, the words (terms) are there but no prefixes to indicate how they are related to the field names. I assume there is some magic and/or delve isn't dumping everything. The purpose of this investigation is to figure out how to add something to the document, storing this info. In looking at the Document api, I only see how to add data, terms and values. None of those three appear to be options either. Can someone enlighten me? Thanks, Jim.
James Aylett
2012-Jan-15 14:16 UTC
[Xapian-discuss] I'm trying to relate what I know about Omega/Scriptindex with the actual data
On 15 Jan 2012, at 13:55, Jim Lynch wrote:> Which is what I expect, however I have two "fields" unixdate and summary which I've specified as > > unixdate : field date=unix > summary : field > > In my index file. They are displayed in the delve output as > > summary=Do you remember what was wrong with the bearings? > unixdate=1181883741That looks right. AIUI, without a keyword of `index` in scriptindex, no probabilistic terms will be generated. (The scriptindex documentation is a tiny bit confusing here, I think.)> I don't see a set of terms that would correspond to either of these. Yes, the words (terms) are there but no prefixes to indicate how they are related to the field names. I assume there is some magic and/or delve isn't dumping everything.I'm guessing that the terms are there because they're in other fields.> The purpose of this investigation is to figure out how to add something to the document, storing this info. In looking at the Document api, I only see how to add data, terms and values. None of those three appear to be options either.Xapian doesn't directly support what omega calls fields; it provides a blob of document data that you can use how you wish. Omega uses an encoding mechanism to turn this into a basic key-value store for fields, but you could also drop a JSON document in there, for instance. You need to decide what makes sense for your application. If you want to read and write omega-style fields, that's not terribly difficult; the format is basically lines of key=value (I can't remember offhand whether there's a way of escaping newlines in the values). J -- James Aylett talktorex.co.uk - xapian.org - devfort.com - spacelog.org
James Aylett
2012-Jan-15 16:49 UTC
[Xapian-discuss] I'm trying to relate what I know about Omega/Scriptindex with the actual data
On 15 Jan 2012, at 15:05, Jim Lynch <jim at fayettedigital.com> wrote:> So scriptindex does a set_doc but delve doesn't show the data placed by set_doc as data. I'm guessing delve is in the omega family and interprets what's in the data but doesn't dump it in a raw format. That makes sense.Close -- delve is dumping the raw format, it just happens to be human readable. J
Shane Spencer
2012-Jan-15 21:02 UTC
[Xapian-discuss] I'm trying to relate what I know about Omega/Scriptindex with the actual data
I spent around a full day in the xapian source code making sure I knew all the ins and outs.. it was a much better resource than the online documentation. I'm mostly proficient with the python-xapian bindings at this point now because of it :) - Shane On Sun, Jan 15, 2012 at 4:55 AM, Jim Lynch <jim at fayettedigital.com> wrote:> James, thanks for the explanations. ?I misread the notes. > > As an exercise, I'm trying to convert an existing project that currently > uses Scriptindex and Omega to direct Xapian API calls. ?I did a (I think) > complete dump of a document with > delve -r 565 -d database > and I see things like > > subject='A typical subject' > ?with a corresponding set of terms like > ?Sa Stypical Ssubject > > Which is what I expect, however I have two "fields" unixdate and summary > which I've specified as > > unixdate : field date=unix > summary : field > > In my index file. ?They are displayed in the delve output as > > summary=Do you remember what was wrong with the bearings? > unixdate=1181883741 > > I don't see a set of terms that would correspond to either of these. ?Yes, > the words (terms) are there but no prefixes to indicate how they are related > to the field names. ?I assume there is some magic and/or delve isn't dumping > everything. > > The purpose of this investigation is to figure out how to add something to > the document, storing this info. ?In looking at the Document api, I only see > how to add data, terms and values. ?None of those three appear to be options > either. > > Can someone enlighten me? > > Thanks, > Jim. > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss