Displaying 20 results from an estimated 90000 matches similar to: "Get term from document by position"
2015 Jul 23
1
Get term from document by position
Hello. Is there any FAST way to get a term from the xapian document by it's position, something like
std::string term = Xapian::Document::GetTermByPosition(int position) ?
Below i have described a task that i am trying to solve, in case if somebody is interested.
============================================================================
When displaying search results, i would like to
2015 Jul 26
1
Get term from document by position
> Snippet highlighting is something that was worked on for a GSoC project a
> few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>.
> It?s not available in the 1.2 series, but as I understand it should work out of the
> box in 1.3.3.
I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too
2015 Jul 26
1
Get term from document by position
> Can you file a bug with some example outputs that are unrelated to the search string?
Here is the example (see attachment).
This example does the following:
1)First, it indexes text from the "text.txt" file (see attachment) (actually, this is the text of the following book: "Abbas, Lichtman. Basic immunology").
2)Next, it searches for the "extracellular
2015 Jul 26
1
Get term from document by position
mple (see attachment).
>
> Attachments get stripped out by the mailing list, so I?ve made a private gist of the two files here: <https://gist.github.com/jaylett/ce8455b37e2b84422346>.
>
> Actually, when I run it I get 0 matches, which would explain why you?re just getting the start of the document. However if I adjust things (match the stemming strategy for TermGenerator to
2016 May 09
1
Given a document, how do you get its ID? (perl bindings)
I am writing an indexer that will crawl our web site. Following the
recommendation here:
https://trac.xapian.org/wiki/FAQ/UniqueIds
I'm using the URL as the unique ID for each document. I see how to get a
document from the xapian database if I know its URL, but what I need is
also to be able to find out the URL from the document. Does this mean I
need to store the URL in a value in
2007 Feb 09
1
Fetching document content by Q term in Python
Hello,
I'd like to be able to retrieve the indexes stored copy of the document
text and tried the following:
terms = self.db.allterms()
terms.skip_to('Q' + uri.encode('utf-8'))
term = terms.next()
doc = self.db.get_document(term[1])
print doc.get_data()
I just wildly guessed that [1] was the docid, but of course it isn't. So the
question is, how do I
2012 Nov 03
1
get the title from the document
Dear all,
I am working on a very simple project, in which I wanna get the title from the document.
For instance, this is what I have done so far.
///////////// code? for building the index file
??????? # Load content
??????? content = open(filePath).read()
??????? # Prepare document
??????? document = xapian.Document()
??????? document.set_data(content)
??????? # Store fileName
???????
2011 Aug 01
1
How term distance impacts the weight?
Hey,
I start using Xapian for more than 1 months, it is very nice.
When I look at the weight, I saw that each term will be associated with
position.in doc
I wonder how position used in query? how it impacts the weight of search?
could anyone shed light on this?
Can I understand that position is more useful for Oriental language like
Chinese, Japanese Korean than for Western Languages,
because
2010 Oct 08
1
Get a list of all terms in an indexed corpus
Hello,
I have a corpus that I have indexed with xapian/xappy and I would now
like to generate a corpus-specific list of stopwords. (This is a
technical corpus, so a typical stopword list wouldn't be helpful.)
My first thought was to ask the xapian database for a list of terms
followed by their frequency. My intuition is that I could probably bring
together a list of stopwords by examining
2011 Sep 04
5
Ranking and term proximity
Hi,
I was reading an article recently about how google ranks results
(among many other things of course) based on the proximity of the
search terms in the source documents. In addition, the position of
the search terms in the search query string itself is also taken into
consideration when determining how important each term is.
Does Xapian do something similar - at least for the first part?
2011 Mar 07
1
Set Term Frequency for a Query
Hello,
I have a problem when trying to define a query and setting for each term its
"term frequency" with the classical constructor
Xapian::Query<http://xapian.org/docs/apidoc/html/classXapian_1_1Query.html#f396e213df0d8bcffa473a75ebf228d6>(const
std::string &tname_,
2007 Dec 29
3
Term-Flags
Hi,
Is it necessary to set the down below flag to the TermGenerator,
if I want the "Did you mean ..." spelling corrections?
Xapian::TermGenerator::flags::FLAG_SPELLING
Thank you very much
Markus
2010 Jun 07
2
Is there a 64 character term size limit? In Ruby bindings?
I've just found some items in my Xapian database which aren't being
indexed, when the terms are quite long.
Example term:
Frotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
It represents that the Freedom of Information request was made to a
particular public body. It results in pages like this not correctly
showing results:
2010 Apr 02
1
Question from a new user of xapian: query term weight
Hi all,
I've been a Lucene user for the past year, but lately, with most of my
project moving to Python, I really love Xapian's clean python binding.
I can't seem to see how to boost a query term using Xapian's query
syntax. In Lucene, there is "hello^4 world^.2" to boost "hello" and
suppress "world". However, digging through Xapian's
2011 Jul 28
0
xapian.InvalidArgumentError: Term too long (> 245)?
xapian.InvalidArgumentError: Term too long (> 245): XTEXT...
What is 245 here. 245 characters or 245 bytes or 245 words or 245 unique
words or
245 characters in one word?
Does it include spaces?
Ashish
2005 Jan 14
0
Term prefixes (was: Xapian Feedback)
I wrote:
> I think it's a bug. Or at least QueryParser uses a rather delicate rule
> for when to add a ":" between the prefix and the term, which scriptindex
> doesn't implement. The rule is undocumented (except in the code) so
> it's arguable who is correct.
I've been looking at this some more.
We need some way to distinguish the term prefix from the term
2023 Jul 04
1
Internal error: Message without type term
On Mon, Jul 03, 2023 at 02:26:03PM +0200, David Bremner wrote:
> "Peter P." <peterparker at fastmail.com> writes:
>
> > I ran xapian-check on ~/.notmuch/xapian and include its messages
> > below at the end of this mail. Everyone please forgive me for
> > pasting 1121 there. :)
>
> H'mm. It doesn't look familiar to me, but I will check with
2010 Aug 16
1
No position.{DB,baseA,baseB}
I've just noticed that new indexes no longer have
position.{DB,baseA,baseB} files, all previous indexes (I roll indexes
every week using xapian-compact) have the position files. The index
seems to work but it is returning some odd results, for example if I run
a query with the phrase "machine learning" it mostly returns documents
containing "machine learning" but it also
2018 Jul 12
1
Error while compacting: Bad position key
Mike Hommey <mh at glandium.org> writes:
> Hi,
>
> When running `notmuch compact` today, it stopped with the following
> output:
>
> Compacting database...
> compacting table postlist
> Reduced by 25% 648656K (2498904K -> 1850248K)
> compacting table docdata
> Reduced by 15% 24K (152K -> 128K)
> compacting table termlist
> Reduced by
2011 Nov 06
2
What is the best way to represent a category hierarchy using term prefixes in Xapian?
Assume I have the following example hierarchy:
US
>Michigan
>>Detroit
>>Grand Rapids
>>Lansing
>Minnesota
>>Grand Rapids
>>Minneapolis
>>St Paul
>Ohio
>>Columbus
>>Grand Rapids
>>Sandusky
I see two ways that I could index a ?Grand Rapids, Michigan? document with
prefixed terms:
XFIRSTLEVELus
XSECONDLEVELmichigan