Hi, I'm new to Xapian and wanted to know if it has a specific feature. I want to be able to check the relation between two terms on a page based on how close they are together on the page. I want to use a combination of n-gram based labeling and the "slop" feature found in Elasticsearch. Does Xapian have this/a similar feature? I haven't been able to find any programs that have features similar to the "slop" feature on Elasticsearch yet. Thanks! On Wed, Jun 20, 2018 at 12:36 PM <xapian-discuss-request at lists.xapian.org> wrote:> Welcome to the Xapian-discuss at lists.xapian.org mailing list! > > To post to this list, send your message to: > > xapian-discuss at lists.xapian.org > > General information about the mailing list is at: > > https://lists.xapian.org/mailman/listinfo/xapian-discuss > > If you ever want to unsubscribe or change your options (eg, switch to > or from digest mode, change your password, etc.), visit your > subscription page at: > > > https://lists.xapian.org/mailman/options/xapian-discuss/gaby.goldberg%40rivdata.com > > > You can also make such adjustments via email by sending a message to: > > Xapian-discuss-request at lists.xapian.org > > with the word `help' in the subject or body (don't include the > quotes), and you will get back a message with instructions. > > You must know your password to change your options (including changing > the password, itself) or to unsubscribe without confirmation. It is: > > zuunbite > > Normally, Mailman will remind you of your lists.xapian.org mailing > list passwords once every month, although you can disable this if you > prefer. This reminder will also include instructions on how to > unsubscribe or change your account options. There is also a button on > your options page that will email your current password to you. >-- Gaby Goldberg Data Analysis and Marketing Intern p: 805.452.5413 w: carpe.io e: gaby.goldberg@ <alexis.leitner at carpe.io>rivdata.com [image: Carpe Data Logo] [image: Facebook] <https://www.facebook.com/carpedatacorp> [image: Twitter] <https://twitter.com/carpedatacorp> [image: LinkedIn] <https://www.linkedin.com/company/carpedatacorp>
On 20 Jun 2018, at 20:39, Gaby Goldberg <gaby.goldberg at rivdata.com> wrote:> I'm new to Xapian and wanted to know if it has a specific feature. I want > to be able to check the relation between two terms on a page based on how > close they are together on the page. I want to use a combination of n-gram > based labeling and the "slop" feature found in Elasticsearch. Does Xapian > have this/a similar feature? I haven't been able to find any programs that > have features similar to the "slop" feature on Elasticsearch yet.Hi, Gaby — you're probably looking for the window parameter of the NEAR positional operator. I realise as I write this that it isn't terribly well-documented in the API, but there are hints here: https://xapian.org/docs/apidoc/html/classXapian_1_1Query.html#adb287c496f72327d1c1411fac0570ea9 I've added some notes to our missing documentation list [1] that we need to work on this! [1] https://trac.xapian.org/wiki/MissingDocumentation J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/
Please keep replies on the mailing list — more people can help (and benefit) that way :) So OP_NEAR looks for its terms close to each other (hence "near"). The window is how far away they can be. Probably the easiest way to play with this is using the NEAR syntax in the query parser. So if you had a plain text document: I am walking, always walking. And index it in a very simple fashion (in python): import xapian db = xapian.WritableDatabase("testdb") doc = xapian.Document() tg = xapian.TermGenerator() tg.set_document(doc) tg.index_text("I am walking, always walking.") db.add_document(doc) Then you can run NEAR queries: import xapian db = xapian.Database("testdb") qp = xapian.QueryParser() qp.set_database(db) def query(query): enq = xapian.Enquire(db) q = qp.parse_query(query) enq.set_query(q) for match in enq.get_mset(0, 10): print(match.docid) query("I NEAR/1 walking") # prints nothing query("I NEAR/2 walking") # prints 1 There's no document in the database where "I" is adjacent to "walking". However there is one where it's within two ("I am walking…"). Likewise: query("I NEAR/2 always") # nothing query("am NEAR/2 always") # prints 1 query("walking NEAR/2 always") # prints 1 again Hope that helps a little! J> On 20 Jun 2018, at 21:23, Gaby Goldberg <gaby.goldberg at rivdata.com> wrote: > > I'm a bit confused on how the operator works. Does it find the distance between the two terms? > > On Wed, Jun 20, 2018 at 1:09 PM James Aylett <james at tartarus.org> wrote: > On 20 Jun 2018, at 20:39, Gaby Goldberg <gaby.goldberg at rivdata.com> wrote: > > > I'm new to Xapian and wanted to know if it has a specific feature. I want > > to be able to check the relation between two terms on a page based on how > > close they are together on the page. I want to use a combination of n-gram > > based labeling and the "slop" feature found in Elasticsearch. Does Xapian > > have this/a similar feature? I haven't been able to find any programs that > > have features similar to the "slop" feature on Elasticsearch yet. > > Hi, Gaby — you're probably looking for the window parameter of the NEAR positional operator. I realise as I write this that it isn't terribly well-documented in the API, but there are hints here: > > https://xapian.org/docs/apidoc/html/classXapian_1_1Query.html#adb287c496f72327d1c1411fac0570ea9 > > I've added some notes to our missing documentation list [1] that we need to work on this! > > [1] https://trac.xapian.org/wiki/MissingDocumentation > > J > > -- > James Aylett > devfort.com — spacelog.org — tartarus.org/james/ > > > > -- > Gaby Goldberg > Data Analysis and Marketing Intern > p: 805.452.5413 > w: carpe.io e: gaby.goldberg at rivdata.com > >-- James Aylett devfort.com — spacelog.org — tartarus.org/james/