similar to: Search::Xapian add_database'd search results are odd?

Displaying 20 results from an estimated 1000 matches similar to: "Search::Xapian add_database'd search results are odd?"

2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2012 Mar 31
1
Project: Posting list encoding improvements
Hi Xapianers: My name is Weixian Zhou, Computer Science student of University at Buffalo, State University of New York. I am interested in the project of posting list encoding improvements and weighting schemes. I have some questions toward them. 1) After read the comments in brass_postlist.cc, I am still not very clear about the detailed structure of postings list. If you can provide some simple
2010 Oct 22
1
overlapping docids when searching on multiple databases?
Just a quick question - it seems to me that it's entirely possible to get overlapping docids when searching on multiple databases? For instance: open database1 add database2 to database1 search db1+db2 if docid 10 exists in both databases, is there any way of telling which which database to retrieve the document from? /Per Jessen, Z?rich
2023 May 03
1
manual flushing thresholds for deletes?
On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > This will also effectively ignore boolean terms, assuming you're giving > > them wdf of 0 (because $3 here is the collection frequency, which is > > sum(wdf(term)) over all documents). > > Should boolean terms be ignored when estimating flushing >
2014 Jan 21
2
seg fault on search
I have written a very simple function to return the match count based on the simplesearch.cc code. It fails with a seg fault. The relevant code is: -------------------- int ftQuery(char* qs, const char* dbname,char* results, int msize) { long docid; char* op; char fullDB[1024]; string queryString;
2013 Mar 26
1
Xapian wiki: typo in docid to sub-db translation?
On the Xapian wiki page: http://trac.xapian.org/wiki/FAQ/MultiDatabaseDocumentID It seems to me that: subdatabase_number = docid_combined % number_of_databases; Should read: subdatabase_number = (docid_combined - 1) % number_of_databases; Otherwise I'm seriously confused ... Cheers, jf
2013 Mar 05
1
Remote database & local database, and adding new weight found vtable error
Hello, guys. Q1. now I have load all the docid and its document data into a dictionary for faster loading data instead of calling Xapian::MSetIterator i; i.get_document().get_data(); but I was happened to discover that the dictionaries got by such two method were different: both methods use DB1, DB2 method 1: Xapian::Database db = Xapian::Database(the path of DB1); Xapian::Database db2 =
2023 May 03
1
manual flushing thresholds for deletes?
Olly Betts <olly at survex.com> wrote: > On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > 10 seems too long. You want the mean word length weighted by frequency > > > of occurrence. For English that's typically around 5 characters, which > > > is 5 bytes. If we go for +1 that's:
2017 Jun 05
2
Logging the click data
Hi James, > ID: some identifier for each query > QUERY: text of the query (when the query is run) > URLs: every URL displayed (or alternatively, the Xapian docid — this > might be easier) > OFFSET: otherwise you'll have difficulty coping with result pages other > than the first page (when this happens, the query ID should probably > remain the same, and when you aggregate
2014 May 10
2
some trouble when devising skiplist
Hi, I was confronted with some trouble, I describe the trouble in my journal http://trac.xapian.org/wiki/GSoC2014/Posting%20list%20encoding%20improvements/Journal#May10 And corresponding code is in my git. Would you like to give me some help? ------------------ Shangtong Zhang,Second Year Undergraduate, School of Computer Science, Fudan University, China. -------------- next part
2017 Dec 18
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
On Sat, Dec 16, 2017 at 10:11:40PM +0000, Olly Betts wrote: > Unfortunately the sort key isn't currently exposed via the public API. > It's available internally and it seems like it ought to be accessible > but there's no accessor method for it - I can add one but that won't > help for existing releases. I've added MSetIterator::get_sort_key() to master in
2018 Jan 03
2
Storing the documents text: data record or value ?
Hi, Following the Recoll snippets generation performance problem caused by the new positions list storage scheme in Xapian 1.4, I am experimenting with generating snippets from the complete document text stored in the index. This increases the index size much less than I would have expected (around 10-15% apparently with my home directory data), which is good news obviously. I have tried
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > 10 seems too long. You want the mean word length weighted by frequency > > of occurrence. For English that's typically around 5 characters, which > > is 5 bytes. If we go for +1 that's: > > Actually, 10 may be too short in my case since there's a
2005 Jul 20
1
docid type redifine
Hello all. I need to redefine a docid type (and all dependent types) like this: typedef unsigned long long docid; I think it would be enough to edit "include/xapian/types.h", but it isn't so. 1) I've added : string om_tostring(unsigned long long val) { CONVERT_TO_STRING("%llu") } in common/utils.{h,cc} 2) In include/enquire.h (line 438) I've found the
2011 Aug 09
3
what is the fastest way to fetch results which are sorted by timestamp ?
what is the fastest way to fetch results which are sorted by timestamp ? i want to use xapian as my search engine , use add_boolean_term(something) and add_value(0,sortable_serialise(get_timestamp())) to a doc. search through enquire.set_weighting_scheme(xapian.BoolWeight()) and enquire.set_sort_by_value(0,True) to ensure that the results are sorted by the timestamp. This method is ok , but
2012 Mar 09
3
128 bit Document IDs (Please don't hurt me)
I apologize for what may be a sore subject. 4 billion documents is a heck of a lot. 64 bit vs 32 bit would be an incredibly large database with an average document and term size. Why 128 bit? Simply for address space. Mapping a UUID (128 bit) or MongoDB ObjectID (96 bit) directly into the Xapian document space removes the need for referencing one or the other from one or both. I see a common
2007 Jul 24
1
Xapian::DocNotFoundError on replace_document? (Called from Search::Xapian)
Hello, I'm using Xapian 1.0.2 (flint) and matching Search::Xapian. I'm getting: terminate called after throwing an instance of 'Xapian::DocNotFoundError', which dumps core. at first it was after adding my 2nd document (to an empty db, although I don't know if that has any bearing) to the database with a replace_document() call. I shifted the first document off the
2007 Feb 09
1
Fetching document content by Q term in Python
Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I
2010 Apr 26
8
[LLVMdev] Proposal for a new LLVM concurrency memory model
Hi all, Chandler, Owen, and I have written up a proposal for a new memory model and atomic intrinsics in LLVM, which will make it possible to support Java and the upcoming C++0x standard. The proposed changes to the LangRef are at <http://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest>, and a rationale for some of the more surprising changes is at
2014 Apr 13
2
Adding an external library to Xapian
My code is not on Github. I am using the tarball as of now. The following it the error that occurred: http://pastebin.com/cVJrjUZX On Sun, Apr 13, 2014 at 8:16 PM, James Aylett <james-xapian at tartarus.org>wrote: > On 13 Apr 2014, at 15:37, Pallavi Gudipati <pallavigudipati at gmail.com> > wrote: > > > A linker error is encountered even after following the above