similar to: Resume indexing

Displaying 20 results from an estimated 20000 matches similar to: "Resume indexing"

2009 Jun 23
1
Indexing more than 15 billion documents
Hi, Sorry to follow up on an old thread, but I am wondering if there has been any work done on, or interest in, increasing the maximum document id beyond a 32bit limit? Daniel On Mon, Jun 18, 2007 at 04:11:54AM +0100, Olly Betts wrote: > > In particular, there is currently a limit of 4 billion documents in a > > database, due to using a 32 bit type for document IDs, but I don't
2013 Jun 19
2
Broken links on trac release overview page
http://trac.xapian.org/wiki/ReleaseOverview/1.2.15 The links to individual NEWS items are broken: i.e. http://svn.xapian.org/*checkout*/tags/1.2.15/xapian-core/NEWS http://svn.xapian.org/Xapian/tags/1.2.15/xapian-core/NEWS?view=markup might be a better link. Regards, Bron. -- Bron Gondwana brong at fastmail.fm
2017 Apr 20
2
Question about the ticket #743 omindex: delay libmagic checks
Hi, I'm working on the ticket #743 omindex: delay libmagic checks <https://trac.xapian.org/ticket/743>. As the ticket's Description mention, the call to libmagic is expensive than call the stat, so we can check the size by call the stat to get size before call libmagic to get a mime type. But how about the timestamps check? since timestamps check need to iterate the DB to check if
2012 Mar 09
3
128 bit Document IDs (Please don't hurt me)
I apologize for what may be a sore subject. 4 billion documents is a heck of a lot. 64 bit vs 32 bit would be an incredibly large database with an average document and term size. Why 128 bit? Simply for address space. Mapping a UUID (128 bit) or MongoDB ObjectID (96 bit) directly into the Xapian document space removes the need for referencing one or the other from one or both. I see a common
2008 Aug 21
2
How to speed up indexing ?
I'm new to Xapian & need some help, many thanks if anyone replies. I did a release build from xapian-core-1.0.7 with VS2008 by using Charlie Hull's makefiles. I'm trying to test-index my dataset -- some 200'000 docs, each document being (on average) 50 bytes long and having 6 words. I tried (a) not to use stemmer, (b) commit_transaction() on every 50/100/etc. docs, (c) not
2013 Oct 13
2
trouble with user's right indexing with omega
Hi, I'm using omindex to index files and I want make query with user/group boolean prefix (I*, I at ... and I#...). That work well with "other" and "group" right, but not in all case for "user" right. Here is an example: assume that we have an user "ftp" not in "users" group. If file right are: -rw-r------ 1 ftp users 13 2013-10-06
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
Hi Parth, I?ve implemented SVMRanker class and also sorted out most of current Letor APIs. Now I?m trying to use INEX dataset to verify my implement. But I stuck in the indexing part. You said in the documentation that we have to add prefix when indexing. Also I notice that you set some metadata in omindex.cc of your version. But the omindex.cc has changed since 2011. I think that?s why my result
2014 Nov 30
3
Contributing to Xapian
Hi Olly I will try to work on : http://trac.xapian.org/wiki/GSoCProjectIdeas#Project:LearningtoRank I will be taking a Machine Learning class the next semester and I hope that this project will help me supplement my learning in Machine Learning and also gain a bit of knowledge in IR. If you can give me ideas on how to get around with the code for LTR project, it will be awesome. I can look at
2014 Mar 11
2
[GSOC 2014] Indexing INEX dataset
On Tue, Mar 11, 2014 at 03:20:31PM +0100, Parth Gupta wrote: > > > > On current trunk, we index the title with prefix "S" by default in > > omindex, though with a wdf inc of 5 rather than 1: > > > > indexer.index_text(title, 5, "S"); > > > > So I don't think you need that change to omindex now. > > Yes, but please
2016 May 09
1
Given a document, how do you get its ID? (perl bindings)
I am writing an indexer that will crawl our web site. Following the recommendation here: https://trac.xapian.org/wiki/FAQ/UniqueIds I'm using the URL as the unique ID for each document. I see how to get a document from the xapian database if I know its URL, but what I need is also to be able to find out the URL from the document. Does this mean I need to store the URL in a value in
2014 Dec 01
2
Contributing to Xapian
I'd suggest that a good thing to look at would be functional tests of the metrics and algorithms in Hanxiao Sun's work from this summer. You'll generally need to go either to the original paper, or find an alternative implementation, to build up a series of tests that demonstrate that the implementation is doing what it is supposed to. Xapian-core contains a test framework which it
2014 Mar 17
2
[GSOC 2014] Indexing INEX dataset
Hi Olly, Wouldn't setting the weight of terms in title back to normal (e.g. 5 to 1) by below line, automatically adjust the wdfs and field lengths? indexer.index_text(title, 5, "S"); -> indexer.index_text(title, 1, "S"); if it does not then we should include that part in the patch too. I like to create a patch for xapian-letor for resolving common code of xapian.
2018 Apr 30
5
Need support to build xapian on Windows with Microsoft compiler
Hello, Thank you very much for quick response. I need only xapian-core. As I wrote on my case compilation with Visual Studio 2015 successful, just I have runtime errors, while the same code on LINUX runs fine. I'll try the hints from (https://trac.xapian.org/browser/git/xapian-core/INSTALL?rev=RELEASE/1.4#L54) and maybe to migrate my project to VS2017 and test it again. If I understand
2016 Jan 10
2
Strange index consistency issue
Olly Betts <olly <at> survex.com> writes: > > You could try: > > delve -t '' ./xapiandb > > That will list the document lengths, so you can see if document 6 is in > that list or not. I am the recoll user mentioned in the first post above. I still have a copy of the (potentially) corrupted index and I did the requested testing. I ran delve -t
2012 Jul 17
1
Can not use custom weight scheme with python binding
Hi, I'm trying to use custom weight with python binding. My test code is like this. class TinkerWeight(xapian.Weight): def __init__(self): pass def name(self): return "Tinker" def serialize(self): return "" def get_sumpart(*args): return 1 def get_maxpart(*args): return 1 def get_sumextra(*args):
2014 Oct 24
2
Contributing to Xapian
Hi All I am Manu and recently came across the Xapian project. I will like to contribute to Xapian that gets me introduced to Information Retrieval. I have a basic knowledge of C++. Can you suggest me and help me choose a task that can be good for beginners. Thanks a lot Best Regards Manu Gupta
2011 Jun 13
2
Xapian 1.2.6 released
I've uploaded Xapian 1.2.6 (including Search::Xapian 1.2.6.0). As usual you can download from: http://xapian.org/download You can read an overview of the release here: http://trac.xapian.org/wiki/ReleaseOverview/1.2.6 The full lists of user-visible changes are linked to from there, and also from the "[news]" links on the download page. As always, if you encounter problems,
2009 Apr 12
2
Indexing speed benchmark - Xapian, Solr
I came across this benchmark between Xapian & Solr: http://www.anur.ag/blog/2009/03/xapian-and-solr/ According to the benchmark, a doc set that took Solr 34 min to index took Xapian 7 hours. Solr's index is also much smaller - 2.5GB to Xapian's 8.9GB. I'm new to Xapian. Just wondering if results like these are typical? Is indexing speed & size a known issue in Xapian? Or is
2014 Mar 11
2
[GSOC 2013] Question about indexing INEX dataset
Hi, I?m trying to use Omega to index INEX dataset for Letor. But omindex told me these xml files are unknown. Olly told me I could tell omindex to handle them as HTML. (Thanks Olly :) ) Is it appropriate? Parth, could you give me some suggestions? Thank you! Jiarong Wei
2010 Jun 19
2
Xapian 1.0.21 released
I've uploaded Xapian 1.0.21 (including Search::Xapian 1.0.21.0), which as usual you can download from: http://xapian.org/download The most notable changes in this release are: Xapian-core API: * Xapian::Stem now recognises "nb" and "nn" as additional codes for the Norwegian stemmer. * Xapian::QueryParser now correctly parses a wildcarded term in between two other