similar to: Storing the documents text: data record or value ?

Displaying 20 results from an estimated 20000 matches similar to: "Storing the documents text: data record or value ?"

2018 Jan 04
0
Storing the documents text: data record or value ?
On Wed, Jan 03, 2018 at 04:18:18PM +0100, Jean-Francois Dockes wrote: > Seen from the outside, it would appear to make sense to use values, so that > code which needs to access the data record but not the full document text > does not pay a performance penalty. > > I am wondering if there are other arguments for using either method ? I wouldn't recommend using a value to store
2013 Jun 19
2
Compact databases and removing stale records at the same time
On Wed, Jun 19, 2013, at 03:49 PM, Olly Betts wrote: > On Wed, Jun 19, 2013 at 01:29:16PM +1000, Bron Gondwana wrote: > > The advantage of compact - it runs approximately 8 times as fast (we > > are CPU limited in each case - writing to tmpfs first, then rsyncing > > to the destination) and it takes approximately 75% of the space of a > > fresh database with maximum
2009 Jul 15
2
XAPIAN_FLUSH_THRESHOLD
I'm playing around with a machine that has 2 GB of memory. Indexing about 5GB of data average of 2MB per document. The documents are plain text. I notice the omindex's memory fott print get's biger an bigger then the machine starts to swap and it all slows down to a crawl. In regards to export XAPIAN_FLUSH_THRESHOLD I know the default is 10000 Am I right in saying that for my setup
2007 Feb 09
1
Fetching document content by Q term in Python
Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I
2013 Mar 08
2
Gsoc-2013
Hi, I am Chinmay Naik, an undergraduate in Computer Science at Bangalore Institute of Technology, Bangalore. I am an experienced programmer and good with C,C++,Python,Java,OpenGL and would love to participate in Gsoc-13. >From the ideas listed, i am interested to work on the project "posting list encoding improvements". I am a newbie to Xapian but would like to get involved and get a
2017 Jun 05
2
Logging the click data
Hi James, > ID: some identifier for each query > QUERY: text of the query (when the query is run) > URLs: every URL displayed (or alternatively, the Xapian docid — this > might be easier) > OFFSET: otherwise you'll have difficulty coping with result pages other > than the first page (when this happens, the query ID should probably > remain the same, and when you aggregate
2004 Dec 21
1
Search::Xapian add_database'd search results are odd?
Sorry if this is the wrong forum to discuss Search::Xapian issues -- this just seems like the best place.. Anyways, I've been testing out using $db->add_database() when searching, and it seems like the docids I'm getting out of it are incorrect, almost as though they're "double" what they should be (numerically)... the docids that exist should be around 950,000 and
2016 Apr 12
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Mon, Apr 11, 2016 at 09:54:36AM +0200, Jean-Francois Dockes wrote: > > The question which remains for me is if I should run xapian-compact > > after an initial indexing operation. I guess that this depends on the > > amount of expected updates and that there is no easy answer ? > > I think it's not obvious whether it's a good plan
2013 Jun 19
2
Compact databases and removing stale records at the same time
I'm trying to compact (or at least merge) multiple databases, while stripping search records which are no longer required. Backstory: I've inherited the Cyrus IMAPd xapian-based search code from Greg Banks when he left Opera. One of the unfinished parts was removing expunged emails from the search database. We moved from having a single search database to supporting multiple
2014 Apr 13
2
Adding an external library to Xapian
My code is not on Github. I am using the tarball as of now. The following it the error that occurred: http://pastebin.com/cVJrjUZX On Sun, Apr 13, 2014 at 8:16 PM, James Aylett <james-xapian at tartarus.org>wrote: > On 13 Apr 2014, at 15:37, Pallavi Gudipati <pallavigudipati at gmail.com> > wrote: > > > A linker error is encountered even after following the above
2016 Apr 11
2
Xapian 1.3.5 snapshot performance and index size
Olly Betts writes: > On Sun, Apr 10, 2016 at 04:47:01PM +0200, Jean-Francois Dockes wrote: > > Some might notice the 50% index size increase. Excessive index size is > > already one relatively rare, but recurring complaint. Except if I did > > something wrong: I'm actually quite surprised by it. > > Did you try compacting the resulting databases? > >
2011 Aug 09
3
what is the fastest way to fetch results which are sorted by timestamp ?
what is the fastest way to fetch results which are sorted by timestamp ? i want to use xapian as my search engine , use add_boolean_term(something) and add_value(0,sortable_serialise(get_timestamp())) to a doc. search through enquire.set_weighting_scheme(xapian.BoolWeight()) and enquire.set_sort_by_value(0,True) to ensure that the results are sorted by the timestamp. This method is ok , but
2014 Mar 06
2
Regarding GSOC 2014
Sir, I am a 4th yr undergraduate student pursuing my BTech in CSE at IIIT Hyderbad, India. I am interested in applying for Xapian in Gsoc 2014. I had gone through this year's idea page and interested in applying for 'posting list encoding improvements' project. I am good at C/C++,python; which is one of the requirement. I had done gone through the information Retrieval and
2012 Mar 31
1
Project: Posting list encoding improvements
Hi Xapianers: My name is Weixian Zhou, Computer Science student of University at Buffalo, State University of New York. I am interested in the project of posting list encoding improvements and weighting schemes. I have some questions toward them. 1) After read the comments in brass_postlist.cc, I am still not very clear about the detailed structure of postings list. If you can provide some simple
2010 Feb 18
2
xapian.DocNotFoundError: regression?
Hello, I've installed xapian-core 1.1.3 and xapian-bindings 1.1.4 from the tarballs announced by Olly the other day. With these versions, Enquire.get_mset() seems to consistently be raising xapian.DocNotFoundError. I've attached a small test case which reproduces this. The same test case works fine with 1.0.16 (not the latest 1.0.x, but it's what I had installed). Program output
2014 May 10
2
some trouble when devising skiplist
Hi, I was confronted with some trouble, I describe the trouble in my journal http://trac.xapian.org/wiki/GSoC2014/Posting%20list%20encoding%20improvements/Journal#May10 And corresponding code is in my git. Would you like to give me some help? ------------------ Shangtong Zhang,Second Year Undergraduate, School of Computer Science, Fudan University, China. -------------- next part
2010 Oct 22
1
overlapping docids when searching on multiple databases?
Just a quick question - it seems to me that it's entirely possible to get overlapping docids when searching on multiple databases? For instance: open database1 add database2 to database1 search db1+db2 if docid 10 exists in both databases, is there any way of telling which which database to retrieve the document from? /Per Jessen, Z?rich
2023 May 03
1
manual flushing thresholds for deletes?
On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > This will also effectively ignore boolean terms, assuming you're giving > > them wdf of 0 (because $3 here is the collection frequency, which is > > sum(wdf(term)) over all documents). > > Should boolean terms be ignored when estimating flushing >
2023 Aug 19
1
does Xapian::Enquire hold an MVCC revision?
Olly Betts <olly at survex.com> wrote: > On Fri, Aug 18, 2023 at 10:41:52AM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > While the match is running, get_mset(2000, 1000) needs to track > > > 3000 entries so this won't reduce your heap usage (at least not > > > peak usage). > > > > > > Is the heap
2017 Dec 18
2
How to get the serialise score returned in Xapian::KeyMaker->operator().
On Sat, Dec 16, 2017 at 10:11:40PM +0000, Olly Betts wrote: > Unfortunately the sort key isn't currently exposed via the public API. > It's available internally and it seems like it ought to be accessible > but there's no accessor method for it - I can add one but that won't > help for existing releases. I've added MSetIterator::get_sort_key() to master in