Displaying 20 results from an estimated 1000 matches similar to: "Search::Xapian add_database'd search results are odd?"
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote:
> On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote:
> > The advantage of compact - it runs approximately 8 times as fast (we
> > are CPU limited in each case - writing to tmpfs first, then rsyncing
> > to the destination) and it takes approximately 75% of the space of a
> > fresh database with maximum
2012 Mar 31
1
Project: Posting list encoding improvements
Hi Xapianers:
My name is Weixian Zhou, Computer Science student of University at Buffalo,
State University of New York. I am interested in the project of posting
list encoding improvements and weighting schemes. I have some questions
toward them.
1) After read the comments in brass_postlist.cc, I am still not very clear
about the detailed structure of postings list. If you can provide some
simple
2010 Oct 22
1
overlapping docids when searching on multiple databases?
Just a quick question - it seems to me that it's entirely possible to
get overlapping docids when searching on multiple databases? For
instance:
open database1
add database2 to database1
search db1+db2
if docid 10 exists in both databases, is there any way of telling which
which database to retrieve the document from?
/Per Jessen, Z?rich
2023 May 03
1
manual flushing thresholds for deletes?
On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote:
> Olly Betts <olly at survex.com> wrote:
> > This will also effectively ignore boolean terms, assuming you're giving
> > them wdf of 0 (because $3 here is the collection frequency, which is
> > sum(wdf(term)) over all documents).
>
> Should boolean terms be ignored when estimating flushing
>
2014 Jan 21
2
seg fault on search
I have written a very simple function to return the match count based on the simplesearch.cc code. It fails with a seg fault. The relevant code is:
--------------------
int ftQuery(char* qs, const char* dbname,char* results, int msize) {
long docid;
char* op;
char fullDB[1024];
string queryString;
2013 Mar 26
1
Xapian wiki: typo in docid to sub-db translation?
On the Xapian wiki page:
http://trac.xapian.org/wiki/FAQ/MultiDatabaseDocumentID
It seems to me that:
subdatabase_number = docid_combined % number_of_databases;
Should read:
subdatabase_number = (docid_combined - 1) % number_of_databases;
Otherwise I'm seriously confused ...
Cheers,
jf
2013 Mar 05
1
Remote database & local database, and adding new weight found vtable error
Hello, guys.
Q1.
now I have load all the docid and its document data into a dictionary for
faster loading data instead of calling
Xapian::MSetIterator i;
i.get_document().get_data();
but I was happened to discover that the dictionaries got by such two method
were different:
both methods use DB1, DB2
method 1:
Xapian::Database db = Xapian::Database(the path of DB1);
Xapian::Database db2 =
2023 May 03
1
manual flushing thresholds for deletes?
Olly Betts <olly at survex.com> wrote:
> On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote:
> > Olly Betts <olly at survex.com> wrote:
> > > 10 seems too long. You want the mean word length weighted by frequency
> > > of occurrence. For English that's typically around 5 characters, which
> > > is 5 bytes. If we go for +1 that's:
2017 Jun 05
2
Logging the click data
Hi James,
> ID: some identifier for each query
> QUERY: text of the query (when the query is run)
> URLs: every URL displayed (or alternatively, the Xapian docid — this
> might be easier)
> OFFSET: otherwise you'll have difficulty coping with result pages other
> than the first page (when this happens, the query ID should probably
> remain the same, and when you aggregate
2014 May 10
2
some trouble when devising skiplist
Hi,
I was confronted with some trouble, I describe the trouble in my journal
http://trac.xapian.org/wiki/GSoC2014/Posting%20list%20encoding%20improvements/Journal#May10
And corresponding code is in my git.
Would you like to give me some help?
------------------
Shangtong Zhang,Second Year Undergraduate,
School of Computer Science,
Fudan University, China.
-------------- next part
2017 Dec 18
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
On Sat, Dec 16, 2017 at 10:11:40PM +0000, Olly Betts wrote:
> Unfortunately the sort key isn't currently exposed via the public API.
> It's available internally and it seems like it ought to be accessible
> but there's no accessor method for it - I can add one but that won't
> help for existing releases.
I've added MSetIterator::get_sort_key() to master in
2018 Jan 03
2
Storing the documents text: data record or value ?
Hi,
Following the Recoll snippets generation performance problem caused by the
new positions list storage scheme in Xapian 1.4, I am experimenting with
generating snippets from the complete document text stored in the index.
This increases the index size much less than I would have expected (around
10-15% apparently with my home directory data), which is good news
obviously.
I have tried
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote:
> Olly Betts <olly at survex.com> wrote:
> > 10 seems too long. You want the mean word length weighted by frequency
> > of occurrence. For English that's typically around 5 characters, which
> > is 5 bytes. If we go for +1 that's:
>
> Actually, 10 may be too short in my case since there's a
2005 Jul 20
1
docid type redifine
Hello all.
I need to redefine a docid type (and all dependent types) like this: typedef unsigned long long docid;
I think it would be enough to edit "include/xapian/types.h", but it isn't so.
1) I've added :
string
om_tostring(unsigned long long val)
{
CONVERT_TO_STRING("%llu")
}
in common/utils.{h,cc}
2) In include/enquire.h (line 438) I've found the
2011 Aug 09
3
what is the fastest way to fetch results which are sorted by timestamp ?
what is the fastest way to fetch results which are sorted by timestamp ?
i want to use xapian as my search engine , use add_boolean_term(something) and add_value(0,sortable_serialise(get_timestamp())) to a doc.
search through enquire.set_weighting_scheme(xapian.BoolWeight()) and enquire.set_sort_by_value(0,True) to ensure that the results are sorted by the timestamp.
This method is ok , but
2012 Mar 09
3
128 bit Document IDs (Please don't hurt me)
I apologize for what may be a sore subject. 4 billion documents is a
heck of a lot. 64 bit vs 32 bit would be an incredibly large database
with an average document and term size. Why 128 bit? Simply for
address space.
Mapping a UUID (128 bit) or MongoDB ObjectID (96 bit) directly into
the Xapian document space removes the need for referencing one or the
other from one or both. I see a common
2007 Jul 24
1
Xapian::DocNotFoundError on replace_document? (Called from Search::Xapian)
Hello,
I'm using Xapian 1.0.2 (flint) and matching Search::Xapian.
I'm getting:
terminate called after throwing an instance of
'Xapian::DocNotFoundError', which dumps core.
at first it was after adding my 2nd document (to an empty db, although
I don't know if that has any bearing) to the database with a
replace_document() call.
I shifted the first document off the
2007 Feb 09
1
Fetching document content by Q term in Python
Hello,
I'd like to be able to retrieve the indexes stored copy of the document
text and tried the following:
terms = self.db.allterms()
terms.skip_to('Q' + uri.encode('utf-8'))
term = terms.next()
doc = self.db.get_document(term[1])
print doc.get_data()
I just wildly guessed that [1] was the docid, but of course it isn't. So the
question is, how do I
2010 Apr 26
8
[LLVMdev] Proposal for a new LLVM concurrency memory model
Hi all,
Chandler, Owen, and I have written up a proposal for a new memory
model and atomic intrinsics in LLVM, which will make it possible to
support Java and the upcoming C++0x standard. The proposed changes to
the LangRef are at
<http://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest>,
and a rationale for some of the more surprising changes is at
2014 Apr 13
2
Adding an external library to Xapian
My code is not on Github. I am using the tarball as of now. The following
it the error that occurred:
http://pastebin.com/cVJrjUZX
On Sun, Apr 13, 2014 at 8:16 PM, James Aylett <james-xapian at tartarus.org>wrote:
> On 13 Apr 2014, at 15:37, Pallavi Gudipati <pallavigudipati at gmail.com>
> wrote:
>
> > A linker error is encountered even after following the above