similar to: Fetching document content by Q term in Python

Displaying 20 results from an estimated 1000 matches similar to: "Fetching document content by Q term in Python"

2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2005 Feb 25
2
Bug in TermIterator::skip_to() ?
Hi all, I've been toying with xapian (mostly using the Python bindings) and I think I've hit a bug in the TermIterator::skip_to() method (or maybe in QuartzAllTermsList::skip_to()). I've attached a c++ source file that demonstrates the issue. In short, if you have a WritableDatabase, ask for the all-terms TermIterator with db.allterms_begin(), and then skip_to() a word that is itself
2012 Jan 20
3
get_docid???
my $mset = $enq->get_mset($nstart,$nrecords); for(my $mit=$mset->begin(); $mit != $mset->end();$mit++) { my $doc = $mit->get_document(); my $dat = $doc->get_data(); my $id = $doc->get_docid(); } [Fri Jan 20 10:35:06 2012] newmail.cgi: Can't locate auto/Search/Xapian/Document/get_docid.al in @INC (@INC contains: /etc/perl
2010 Jun 24
1
Quickest way to retrieve data for a large match set?
We're using the Perl binding to access Xapian in a simple search of image metadata (title and keywords). Due to the specification for the search engine, by default we have to sort the results using a function of the search rank, age (well, newness) and popularity (rated by sales of the image). As a result, we have to fetch the complete result set and then calculate a new ranking based on
2013 Jun 19
2
Compact databases and removing stale records at the same time
I'm trying to compact (or at least merge) multiple databases, while stripping search records which are no longer required. Backstory: I've inherited the Cyrus IMAPd xapian-based search code from Greg Banks when he left Opera. One of the unfinished parts was removing expunged emails from the search database. We moved from having a single search database to supporting multiple
2014 Mar 06
2
Regarding GSOC 2014
Sir, I am a 4th yr undergraduate student pursuing my BTech in CSE at IIIT Hyderbad, India. I am interested in applying for Xapian in Gsoc 2014. I had gone through this year's idea page and interested in applying for 'posting list encoding improvements' project. I am good at C/C++,python; which is one of the requirement. I had done gone through the information Retrieval and
2014 Apr 13
2
Adding an external library to Xapian
My code is not on Github. I am using the tarball as of now. The following it the error that occurred: http://pastebin.com/cVJrjUZX On Sun, Apr 13, 2014 at 8:16 PM, James Aylett <james-xapian at tartarus.org>wrote: > On 13 Apr 2014, at 15:37, Pallavi Gudipati <pallavigudipati at gmail.com> > wrote: > > > A linker error is encountered even after following the above
2013 Mar 08
2
Gsoc-2013
Hi, I am Chinmay Naik, an undergraduate in Computer Science at Bangalore Institute of Technology, Bangalore. I am an experienced programmer and good with C,C++,Python,Java,OpenGL and would love to participate in Gsoc-13. >From the ideas listed, i am interested to work on the project "posting list encoding improvements". I am a newbie to Xapian but would like to get involved and get a
2018 Jan 03
2
Storing the documents text: data record or value ?
Hi, Following the Recoll snippets generation performance problem caused by the new positions list storage scheme in Xapian 1.4, I am experimenting with generating snippets from the complete document text stored in the index. This increases the index size much less than I would have expected (around 10-15% apparently with my home directory data), which is good news obviously. I have tried
2015 Mar 11
2
stub-file and get_doccount
Hello, i switched from one big index to a stub file with many indexes and running into a problem. i have a tool to fetch a random document via: get_doccount random id up to get_doccount get_document with that id after changing to stub file this failes. Is there a nice way to get a random document from a stub file? ?MfG? Felix Ostmann
2004 Aug 23
1
postlist chunking
Postlists are split up into chunks, so that skip_to can avoid reading all the postlist. Currently the chunk threshold is 2048, but this is checked before adding an entry, so the postlist chunk can actually grow a little larger. Something like 2060 at most. Unfortunately this isn't a good threshold with the default blocksize (8192 bytes). Internally the B-tree splits up items with a large
2023 Aug 19
1
does Xapian::Enquire hold an MVCC revision?
Olly Betts <olly at survex.com> wrote: > On Fri, Aug 18, 2023 at 10:41:52AM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > While the match is running, get_mset(2000, 1000) needs to track > > > 3000 entries so this won't reduce your heap usage (at least not > > > peak usage). > > > > > > Is the heap
2013 Mar 04
2
Need Beginner Guide for Matcher Optimisations Project
Hi, While searching for a project which matches my interest andskill level, I found this project named Matcher Optimization. This project is really challenging and excting from my view point and I would like to be a part of this project. Optimization techniques metioned in the reference links provided will take some time for me to have a good understanding about them. But I am trying to get my
2016 Jan 08
2
Strange index consistency issue
Hi, A Recoll user is reporting an index corruption problem. In general, index corruption happens from time to time with Recoll, because of crashes, reboots, misc Recoll bugs, etc. The strange thing here is that xapian-check does not seem to detect anything. In a nutshell, some document numbers seem to point to a data blackhole: the docids are returned when searching for the file/doc unique
2013 Mar 05
1
Remote database & local database, and adding new weight found vtable error
Hello, guys. Q1. now I have load all the docid and its document data into a dictionary for faster loading data instead of calling Xapian::MSetIterator i; i.get_document().get_data(); but I was happened to discover that the dictionaries got by such two method were different: both methods use DB1, DB2 method 1: Xapian::Database db = Xapian::Database(the path of DB1); Xapian::Database db2 =
2011 May 30
1
How to check docid
I have a bit of code (Python) to delete a number of documents: for f in Flist: xapian_store.delete_document(f.pri_key) in which I am using a unique primary key from an SQL database as the docid for the Xapian database. The problem I have is that some of the documents may not have been created - so I get an error. Now I could just ignore the error (try-recover), but what would be the
2005 Jul 20
1
docid type redifine
Hello all. I need to redefine a docid type (and all dependent types) like this: typedef unsigned long long docid; I think it would be enough to edit "include/xapian/types.h", but it isn't so. 1) I've added : string om_tostring(unsigned long long val) { CONVERT_TO_STRING("%llu") } in common/utils.{h,cc} 2) In include/enquire.h (line 438) I've found the
2010 Feb 18
2
xapian.DocNotFoundError: regression?
Hello, I've installed xapian-core 1.1.3 and xapian-bindings 1.1.4 from the tarballs announced by Olly the other day. With these versions, Enquire.get_mset() seems to consistently be raising xapian.DocNotFoundError. I've attached a small test case which reproduces this. The same test case works fine with 1.0.16 (not the latest 1.0.x, but it's what I had installed). Program output
2014 Apr 13
2
Adding an external library to Xapian
We are using the --enable-maintainer-mode and will move to git soon. The diff file is attached. *Siddhant Mutha* Undergraduate Student Department of Computer Science and Engineering IIT Madras Chennai http://www.siddhantmutha.com/ <http:/www.siddhantmutha.com/> On Sun, Apr 13, 2014 at 8:26 PM, James Aylett <james-xapian at tartarus.org>wrote: > On 13 Apr 2014, at 15:48, Pallavi
2016 May 09
1
Given a document, how do you get its ID? (perl bindings)
I am writing an indexer that will crawl our web site. Following the recommendation here: https://trac.xapian.org/wiki/FAQ/UniqueIds I'm using the URL as the unique ID for each document. I see how to get a document from the xapian database if I know its URL, but what I need is also to be able to find out the URL from the document. Does this mean I need to store the URL in a value in