search for: skip_to

Displaying 15 results from an estimated 15 matches for "skip_to".

2005 Feb 25
2
Bug in TermIterator::skip_to() ?
Hi all, I've been toying with xapian (mostly using the Python bindings) and I think I've hit a bug in the TermIterator::skip_to() method (or maybe in QuartzAllTermsList::skip_to()). I've attached a c++ source file that demonstrates the issue. In short, if you have a WritableDatabase, ask for the all-terms TermIterator with db.allterms_begin(), and then skip_to() a word that is itself a term, the iterator sometimes stay...
2007 Apr 06
3
Count frequency of term in a specific document?
Is there any way to count the frequency of specific term in one document? I can''t find any method... Do you? -- Posted via http://www.ruby-forum.com/.
2007 Feb 09
1
Fetching document content by Q term in Python
Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I get a docid out of a term? Or if I'm completely on the wro...
2007 Apr 03
2
How can I count frequency of terms in a document?
Hi, there. I need some help. Is there a way to count frequencies of terms in a document on Ferret? I know that Ferret has IndexReader#terms_docs_for method which counts all documents. I need to count frequencies of terms in a specific document. Some way?? -- Posted via http://www.ruby-forum.com/.
2005 Oct 18
1
Re: [Xapian-commits] 6355: trunk/xapian-applications/omega/ trunk/xapian-applications/omega/docs/
...x, which is why I don't think this is the right approach. Arranging to delete the right documents might not be too hard. All documents for a particular subsite are indexed by the same H and P term combination so we can just check each deletion candidate against those two postlists (hurrah for skip_to!) That should be pretty efficient. The only problem I can see is that if indexroot is specified, we also need to check each remaining deletion candidate against that, which I think means we have to look in the document data for each one. Ick, that's probably going to be slow. Or can anyone...
2016 May 09
1
Given a document, how do you get its ID? (perl bindings)
I am writing an indexer that will crawl our web site. Following the recommendation here: https://trac.xapian.org/wiki/FAQ/UniqueIds I'm using the URL as the unique ID for each document. I see how to get a document from the xapian database if I know its URL, but what I need is also to be able to find out the URL from the document. Does this mean I need to store the URL in a value in
2014 Mar 06
2
Regarding GSOC 2014
Sir, I am a 4th yr undergraduate student pursuing my BTech in CSE at IIIT Hyderbad, India. I am interested in applying for Xapian in Gsoc 2014. I had gone through this year's idea page and interested in applying for 'posting list encoding improvements' project. I am good at C/C++,python; which is one of the requirement. I had done gone through the information Retrieval and
2010 Jan 16
1
PHP XapianTermIterator/XapianPositionIterator usage
Hello again, /thanks to Peter for previous response. I've been digging around trying to find sample usage of XapianTermIterator/XapianPositionIterator in PHP. The idea is to code up a test case in PHP to perform snippet extraction (with a possible view to coding a pecl extension in C). I found a C++ sample, but that wasn't much help. I must be dense this morning though, since I
2004 Aug 23
1
postlist chunking
Postlists are split up into chunks, so that skip_to can avoid reading all the postlist. Currently the chunk threshold is 2048, but this is checked before adding an entry, so the postlist chunk can actually grow a little larger. Something like 2060 at most. Unfortunately this isn't a good threshold with the default blocksize (8192 bytes). Inte...
2007 Apr 28
6
Determine how many documents a term occurs in
Is there a fast way to determine how many documents a term occurs in, besides iterating through every document with TermDocEnum? -- Best regards, Stian Gryt?yr
2013 Jan 17
1
FASTER Search
...thod here is the function-time-cost for searching: samples % symbol name 75649 28.0401 ChertPostList::move_forward_in_chunk_to_at_least(unsigned int) 30118 11.1635 Xapian::BM25Weight::get_sumpart(unsigned int, unsigned int) const 21291 7.8917 AndMaybePostList::process_next_or_skip_to(double, Xapian::PostingIterator::Internal*) 17803 6.5989 OrPostList::next(double) 12481 4.6262 AndMaybePostList::get_weight() const 10729 3.9768 OrPostList::get_weight() const 10096 3.7422 AndMaybePostList::next(double) 8743 3.2407 ChertDatabase::get_doclength(unsigned int...
2006 Jun 03
2
Initial patch for ExternalPostList
Hi Everybody, Here is the first version of my match for an ExternalPostList, it should apply cleanly to 0.9.5 and 0.9.6. You can use it by first implementing an ExternalPostingSource, then creating a new Query object passing a reference an instance of your implementation to the constructor, see query.h. The ExternalPostingSource implementation is reference counted, so when its no
2009 Feb 12
1
problem when using xapian's static libs in windows
...::TermIterator::Internal::get_collection_freq(void)const " (?get_collection_freq at Internal@TermIterator at Xapian@@UBEIXZ) libinmemory.lib(inmemory_database.obj) : error LNK2001: ????????? "public: virtual class Xapian::TermIterator::Internal * __thiscall Xapian::TermIterator::Internal::skip_to(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?skip_to at Internal@TermIterator at Xapian@@UAEPAV123 at ABV?$basic_string at DU?$char_traits at D@std@@V?$allocator at D@2@@std@@@Z) libinmemory.lib(inmemory_alltermslist....
2020 Aug 23
2
MultiDatabase shard count limitations
...libc-2.28.so [.] __memcpy_ssse3 0.11% script/public-i libxapian.so.30.8.0 [.] MergePostList::recalc_maxweight 0.11% script/public-i libxapian.so.30.8.0 [.] GlassPostList::move_to_chunk_containing 0.11% script/public-i libxapian.so.30.8.0 [.] Glass::ValueChunkReader::skip_to 0.11% perl perl [.] S_maybe_multiconcat 0.10% script/public-i libxapian.so.30.8.0 [.] LocalSubMatch::open_post_list 0.10% perl libc-2.28.so [.] realloc 0.10% script/public-i libstdc++.so.6.0.25 [.] std::__cxx11::basic_strin...
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from February 2020, I've got 390 Xapian shards for 130 public inboxes I want to search against(*). There's more on the horizon (we're expecting tens of thousands of public inboxes). After bumping RLIMIT_NOFILE and running ->add_database a bunch, the actual queries seem to be taking ~30s (not good :x). Now I'm