Displaying 20 results from an estimated 100 matches similar to: "Bug in TermIterator::skip_to() ?"
2007 Feb 09
1
Fetching document content by Q term in Python
Hello,
I'd like to be able to retrieve the indexes stored copy of the document
text and tried the following:
terms = self.db.allterms()
terms.skip_to('Q' + uri.encode('utf-8'))
term = terms.next()
doc = self.db.get_document(term[1])
print doc.get_data()
I just wildly guessed that [1] was the docid, but of course it isn't. So the
question is, how do I
2012 Sep 19
1
java-swig TermIterator
Hello,
Been using Xapian and the Java bindings for years, all was working
great, and I all of a sudden decided to upgrade to the latest 1.2.12 and
use the new java-swig bindings instead of the old hand-crafted JNI which
I think have been deprecated now.
I'm struggling with the new design of the TermIterator. More
specifically, I can't tell when I've reached the end of the list of
2012 Jul 09
1
Question about Document and TermIterator.get_termfreq()
Hi,
While porting the unit tests from perl for the node binding I noticed a
test failed.
I basically create a document, add a few terms, add the document to a
database and then call doc->termlist_begin().get_termfreq(). This throws
"Can't get term frequency from a document termlist which is not associated
with a database."
What I think this means is that I can not call
2010 Jan 16
1
PHP XapianTermIterator/XapianPositionIterator usage
Hello again,
/thanks to Peter for previous response.
I've been digging around trying to find sample usage of
XapianTermIterator/XapianPositionIterator in PHP. The idea is to code up a
test case in PHP to perform snippet extraction (with a possible view to
coding a pecl extension in C). I found a C++ sample, but that wasn't much
help.
I must be dense this morning though, since I
2009 Feb 12
1
problem when using xapian's static libs in windows
I have download source ?1.10? from the internet
and build it into lib
Then I create a project as the helpdoc said
I using vc2005(vc8)
The source in my test project is as follow??copy from the helpdoc?
#include <xapian.h>
#include <iostream>
using namespace std;
int main(int argc, char **argv)
{
// Simplest possible options parsing: we just require three or more
2006 Jan 30
1
More than one Index?
Morning All,
I use scriptindex to build my database and the PHP bindings to pull it
all out.
Is it possible to have more than one index but select what the bindings
search on?
So at the moment I index property addresses, I would also like to index
property descriptions for more advanced searching but only as an
optional extra...probably in an extra search box.
Also I would like to analyse the
2005 Oct 18
1
Re: [Xapian-commits] 6355: trunk/xapian-applications/omega/ trunk/xapian-applications/omega/docs/
On Fri, Jul 29, 2005 at 10:08:13AM +0100, james wrote:
> SVN root: svn://svn.xapian.org/xapian
> Changes by: james
> Revision: 6355
> Date: 2005-07-29 10:08:13 +0100 (Fri, 29 Jul 2005)
>
> Log message (6 lines):
> omindex.cc: add --preserve-nonduplicates / -p option to not delete any
> documents that aren't updated, in replace duplicates mode
2007 Apr 06
3
Count frequency of term in a specific document?
Is there any way to count the frequency of specific term in one
document?
I can''t find any method... Do you?
--
Posted via http://www.ruby-forum.com/.
2007 Apr 03
2
How can I count frequency of terms in a document?
Hi, there.
I need some help.
Is there a way to count frequencies of terms in a document on Ferret?
I know that Ferret has IndexReader#terms_docs_for method which counts
all documents.
I need to count frequencies of terms in a specific document.
Some way??
--
Posted via http://www.ruby-forum.com/.
2013 Oct 30
2
Lucene 3.6.2 backend for xapian (#25)
[Replying to xapian-devel, as I think a wider audience would be useful]
On Mon, Oct 21, 2013 at 11:24:51PM +0800, jiangwen jiang wrote:
> yes, it's less efficient. Lucene database has multiple segments, each
> segment can treat as a independent database. The same term may exists in >=
> 1 segments.
Sorry for taking a while to respond - I've been both busy and mulling
this
2009 Jan 27
1
Segmentation fault in MSetIterator get_weight
Hi,
I'm using xapian with c# and mono and i'm having a segfault in get_weight.
When i print the index variable, the value is clearly too high.
I think something write over it. Do you have any idea on how i could
trace the beginning of the segmentation fault ?
Thanks,
--
Yann
2016 May 09
1
Given a document, how do you get its ID? (perl bindings)
I am writing an indexer that will crawl our web site. Following the
recommendation here:
https://trac.xapian.org/wiki/FAQ/UniqueIds
I'm using the URL as the unique ID for each document. I see how to get a
document from the xapian database if I know its URL, but what I need is
also to be able to find out the URL from the document. Does this mean I
need to store the URL in a value in
2014 Mar 06
2
Regarding GSOC 2014
Sir,
I am a 4th yr undergraduate student pursuing my BTech in CSE at IIIT
Hyderbad, India.
I am interested in applying for Xapian in Gsoc 2014. I had gone through
this year's idea page and interested in applying for 'posting list encoding
improvements' project.
I am good at C/C++,python; which is one of the requirement. I had done gone
through the information Retrieval and
2004 Aug 23
1
postlist chunking
Postlists are split up into chunks, so that skip_to can avoid reading
all the postlist.
Currently the chunk threshold is 2048, but this is checked before adding
an entry, so the postlist chunk can actually grow a little larger.
Something like 2060 at most. Unfortunately this isn't a good threshold
with the default blocksize (8192 bytes).
Internally the B-tree splits up items with a large
2007 Apr 28
6
Determine how many documents a term occurs in
Is there a fast way to determine how many documents a term occurs in,
besides iterating through every document with TermDocEnum?
--
Best regards,
Stian Gryt?yr
2013 Jan 17
1
FASTER Search
I am suffering for slow searching performance on Xapian.
I am using Xapian for indexing about 150,000,000 documents.
It was implemented in C++;
The performance of searching was not that fast.
e.g. Searching a query, which includes about 20 terms, needs 2 secs avg.
For searching, I followed such steps:
1. construct a QueryParser for certain string
2. parse the query to get a Xapian::Query
2006 Jun 03
2
Initial patch for ExternalPostList
Hi Everybody,
Here is the first version of my match for an ExternalPostList, it
should apply cleanly to 0.9.5 and 0.9.6.
You can use it by first implementing an ExternalPostingSource, then
creating a new Query object passing a reference an instance of your
implementation to the constructor, see query.h. The
ExternalPostingSource implementation is reference counted, so when
its no
2007 Nov 08
1
Perl make test fails on threads in rhel5
Hi all,
I've tried building RPMs for RHEL5 and hit this problem in Search::Xapian:
make test fails on test 37:
ok 34 - check PositionIterator
ok 35 - create TermIterator
ok 36 - check TermIterator
dubious
Test returned status 0 (wstat 11, 0xb)
DIED. FAILED tests 37-65
Failed 29/65 tests, 55.38% okay
$ xapian-config --version
xapian-config - xapian-core 1.0.4
$ cat
2023 Aug 27
1
DatabaseModifiedError while iterating on mset
On Wed, Aug 23, 2023 at 01:53:27PM +0000, Eric Wong wrote:
> I'm already retrying the ->get_mset operations; but now I'm
> wondering where I'd hit DatabaseModifiedErrors while inside a
> Xapian::MSetIterator loop.
>
> I assume ->get_document is a place where it gets thrown;
> but once a document is retrieved, can iterating through
> terms in one document
2023 Dec 01
1
termlist_begin ordering in older versions
Hey, I noticed commit 145503bbe4a5bf702cd13cb2e592111e8d7ca89a
(Reimplement Database and WritableDatabase, 2017-10-05) added
the phrase:
"The terms are returned ascending string order (by byte value)"
for termlist_begin. Is that also true for the 1.4 (or even 1.2) series?
Also, is allterms_begin also the same w.r.t. ordering?
Thanks.