thr3ads.net - similar to: "Get term from document by position"

Displaying 20 results from an estimated 90000 matches similar to: "Get term from document by position"

2015 Jul 23

Get term from document by position

Hello. Is there any FAST way to get a term from the xapian document by it's position, something like std::string term = Xapian::Document::GetTermByPosition(int position) ? Below i have described a task that i am trying to solve, in case if somebody is interested. ============================================================================ When displaying search results, i would like to

Get term from document by position

2015 Jul 26

Get term from document by position

> Snippet highlighting is something that was worked on for a GSoC project a > few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. > It?s not available in the 1.2 series, but as I understand it should work out of the > box in 1.3.3. I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too

Get term from document by position

2015 Jul 26

Get term from document by position

> Can you file a bug with some example outputs that are unrelated to the search string? Here is the example (see attachment). This example does the following: 1)First, it indexes text from the "text.txt" file (see attachment) (actually, this is the text of the following book: "Abbas, Lichtman. Basic immunology"). 2)Next, it searches for the "extracellular

Get term from document by position

2015 Jul 26

Get term from document by position

mple (see attachment). > > Attachments get stripped out by the mailing list, so I?ve made a private gist of the two files here: <https://gist.github.com/jaylett/ce8455b37e2b84422346>. > > Actually, when I run it I get 0 matches, which would explain why you?re just getting the start of the document. However if I adjust things (match the stemming strategy for TermGenerator to

Given a document, how do you get its ID? (perl bindings)

2016 May 09

Given a document, how do you get its ID? (perl bindings)

I am writing an indexer that will crawl our web site. Following the recommendation here: https://trac.xapian.org/wiki/FAQ/UniqueIds I'm using the URL as the unique ID for each document. I see how to get a document from the xapian database if I know its URL, but what I need is also to be able to find out the URL from the document. Does this mean I need to store the URL in a value in

Fetching document content by Q term in Python

2007 Feb 09

Fetching document content by Q term in Python

Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I

get the title from the document

2012 Nov 03

get the title from the document

Dear all, I am working on a very simple project, in which I wanna get the title from the document. For instance, this is what I have done so far. ///////////// code? for building the index file ??????? # Load content ??????? content = open(filePath).read() ??????? # Prepare document ??????? document = xapian.Document() ??????? document.set_data(content) ??????? # Store fileName ???????

How term distance impacts the weight?

2011 Aug 01

How term distance impacts the weight?

Hey, I start using Xapian for more than 1 months, it is very nice. When I look at the weight, I saw that each term will be associated with position.in doc I wonder how position used in query? how it impacts the weight of search? could anyone shed light on this? Can I understand that position is more useful for Oriental language like Chinese, Japanese Korean than for Western Languages, because

Get a list of all terms in an indexed corpus

2010 Oct 08

Get a list of all terms in an indexed corpus

Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuition is that I could probably bring together a list of stopwords by examining

Ranking and term proximity

2011 Sep 04

Ranking and term proximity

Hi, I was reading an article recently about how google ranks results (among many other things of course) based on the proximity of the search terms in the source documents. In addition, the position of the search terms in the search query string itself is also taken into consideration when determining how important each term is. Does Xapian do something similar - at least for the first part?

Set Term Frequency for a Query

2011 Mar 07

Set Term Frequency for a Query

Hello, I have a problem when trying to define a query and setting for each term its "term frequency" with the classical constructor Xapian::Query<http://xapian.org/docs/apidoc/html/classXapian_1_1Query.html#f396e213df0d8bcffa473a75ebf228d6>(const std::string &tname_,

Term-Flags

2007 Dec 29

Term-Flags

Hi, Is it necessary to set the down below flag to the TermGenerator, if I want the "Did you mean ..." spelling corrections? Xapian::TermGenerator::flags::FLAG_SPELLING Thank you very much Markus

Is there a 64 character term size limit? In Ruby bindings?

2010 Jun 07

Is there a 64 character term size limit? In Ruby bindings?

I've just found some items in my Xapian database which aren't being indexed, when the terms are quite long. Example term: Frotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust It represents that the Freedom of Information request was made to a particular public body. It results in pages like this not correctly showing results:

Question from a new user of xapian: query term weight

2010 Apr 02

Question from a new user of xapian: query term weight

Hi all, I've been a Lucene user for the past year, but lately, with most of my project moving to Python, I really love Xapian's clean python binding. I can't seem to see how to boost a query term using Xapian's query syntax. In Lucene, there is "hello^4 world^.2" to boost "hello" and suppress "world". However, digging through Xapian's

xapian.InvalidArgumentError: Term too long (> 245)?

2011 Jul 28

xapian.InvalidArgumentError: Term too long (> 245)?

xapian.InvalidArgumentError: Term too long (> 245): XTEXT... What is 245 here. 245 characters or 245 bytes or 245 words or 245 unique words or 245 characters in one word? Does it include spaces? Ashish

Term prefixes (was: Xapian Feedback)

2005 Jan 14

Term prefixes (was: Xapian Feedback)

I wrote: > I think it's a bug. Or at least QueryParser uses a rather delicate rule > for when to add a ":" between the prefix and the term, which scriptindex > doesn't implement. The rule is undocumented (except in the code) so > it's arguable who is correct. I've been looking at this some more. We need some way to distinguish the term prefix from the term

Internal error: Message without type term

2023 Jul 04

Internal error: Message without type term

On Mon, Jul 03, 2023 at 02:26:03PM +0200, David Bremner wrote: > "Peter P." <peterparker at fastmail.com> writes: > > > I ran xapian-check on ~/.notmuch/xapian and include its messages > > below at the end of this mail. Everyone please forgive me for > > pasting 1121 there. :) > > H'mm. It doesn't look familiar to me, but I will check with

No position.{DB,baseA,baseB}

2010 Aug 16

No position.{DB,baseA,baseB}

I've just noticed that new indexes no longer have position.{DB,baseA,baseB} files, all previous indexes (I roll indexes every week using xapian-compact) have the position files. The index seems to work but it is returning some odd results, for example if I run a query with the phrase "machine learning" it mostly returns documents containing "machine learning" but it also

Error while compacting: Bad position key

2018 Jul 12

Error while compacting: Bad position key

Mike Hommey <mh at glandium.org> writes: > Hi, > > When running `notmuch compact` today, it stopped with the following > output: > > Compacting database... > compacting table postlist > Reduced by 25% 648656K (2498904K -> 1850248K) > compacting table docdata > Reduced by 15% 24K (152K -> 128K) > compacting table termlist > Reduced by

What is the best way to represent a category hierarchy using term prefixes in Xapian?

2011 Nov 06

What is the best way to represent a category hierarchy using term prefixes in Xapian?

Assume I have the following example hierarchy: US >Michigan >>Detroit >>Grand Rapids >>Lansing >Minnesota >>Grand Rapids >>Minneapolis >>St Paul >Ohio >>Columbus >>Grand Rapids >>Sandusky I see two ways that I could index a ?Grand Rapids, Michigan? document with prefixed terms: XFIRSTLEVELus XSECONDLEVELmichigan

similar to: Get term from document by position