Displaying 15 results from an estimated 15 matches for "skip_to".
2005 Feb 25
2
Bug in TermIterator::skip_to() ?
Hi all,
I've been toying with xapian (mostly using the Python bindings) and I
think I've hit a bug in the TermIterator::skip_to() method (or maybe
in QuartzAllTermsList::skip_to()).
I've attached a c++ source file that demonstrates the issue. In short,
if you have a WritableDatabase, ask for the all-terms TermIterator
with db.allterms_begin(), and then skip_to() a word that is itself a
term, the iterator sometimes stay...
2007 Apr 06
3
Count frequency of term in a specific document?
Is there any way to count the frequency of specific term in one
document?
I can''t find any method... Do you?
--
Posted via http://www.ruby-forum.com/.
2007 Feb 09
1
Fetching document content by Q term in Python
Hello,
I'd like to be able to retrieve the indexes stored copy of the document
text and tried the following:
terms = self.db.allterms()
terms.skip_to('Q' + uri.encode('utf-8'))
term = terms.next()
doc = self.db.get_document(term[1])
print doc.get_data()
I just wildly guessed that [1] was the docid, but of course it isn't. So the
question is, how do I get a docid out of a term?
Or if I'm completely on the wro...
2007 Apr 03
2
How can I count frequency of terms in a document?
Hi, there.
I need some help.
Is there a way to count frequencies of terms in a document on Ferret?
I know that Ferret has IndexReader#terms_docs_for method which counts
all documents.
I need to count frequencies of terms in a specific document.
Some way??
--
Posted via http://www.ruby-forum.com/.
2005 Oct 18
1
Re: [Xapian-commits] 6355: trunk/xapian-applications/omega/ trunk/xapian-applications/omega/docs/
...x, which is why I don't think this is the right
approach.
Arranging to delete the right documents might not be too hard. All
documents for a particular subsite are indexed by the same H and P term
combination so we can just check each deletion candidate against those
two postlists (hurrah for skip_to!) That should be pretty efficient.
The only problem I can see is that if indexroot is specified, we also
need to check each remaining deletion candidate against that, which I
think means we have to look in the document data for each one. Ick,
that's probably going to be slow. Or can anyone...
2016 May 09
1
Given a document, how do you get its ID? (perl bindings)
I am writing an indexer that will crawl our web site. Following the
recommendation here:
https://trac.xapian.org/wiki/FAQ/UniqueIds
I'm using the URL as the unique ID for each document. I see how to get a
document from the xapian database if I know its URL, but what I need is
also to be able to find out the URL from the document. Does this mean I
need to store the URL in a value in
2014 Mar 06
2
Regarding GSOC 2014
Sir,
I am a 4th yr undergraduate student pursuing my BTech in CSE at IIIT
Hyderbad, India.
I am interested in applying for Xapian in Gsoc 2014. I had gone through
this year's idea page and interested in applying for 'posting list encoding
improvements' project.
I am good at C/C++,python; which is one of the requirement. I had done gone
through the information Retrieval and
2010 Jan 16
1
PHP XapianTermIterator/XapianPositionIterator usage
Hello again,
/thanks to Peter for previous response.
I've been digging around trying to find sample usage of
XapianTermIterator/XapianPositionIterator in PHP. The idea is to code up a
test case in PHP to perform snippet extraction (with a possible view to
coding a pecl extension in C). I found a C++ sample, but that wasn't much
help.
I must be dense this morning though, since I
2004 Aug 23
1
postlist chunking
Postlists are split up into chunks, so that skip_to can avoid reading
all the postlist.
Currently the chunk threshold is 2048, but this is checked before adding
an entry, so the postlist chunk can actually grow a little larger.
Something like 2060 at most. Unfortunately this isn't a good threshold
with the default blocksize (8192 bytes).
Inte...
2007 Apr 28
6
Determine how many documents a term occurs in
Is there a fast way to determine how many documents a term occurs in,
besides iterating through every document with TermDocEnum?
--
Best regards,
Stian Gryt?yr
2013 Jan 17
1
FASTER Search
...thod
here is the function-time-cost for searching:
samples % symbol name
75649 28.0401 ChertPostList::move_forward_in_chunk_to_at_least(unsigned
int)
30118 11.1635 Xapian::BM25Weight::get_sumpart(unsigned int, unsigned
int) const
21291 7.8917 AndMaybePostList::process_next_or_skip_to(double,
Xapian::PostingIterator::Internal*)
17803 6.5989 OrPostList::next(double)
12481 4.6262 AndMaybePostList::get_weight() const
10729 3.9768 OrPostList::get_weight() const
10096 3.7422 AndMaybePostList::next(double)
8743 3.2407 ChertDatabase::get_doclength(unsigned int...
2006 Jun 03
2
Initial patch for ExternalPostList
Hi Everybody,
Here is the first version of my match for an ExternalPostList, it
should apply cleanly to 0.9.5 and 0.9.6.
You can use it by first implementing an ExternalPostingSource, then
creating a new Query object passing a reference an instance of your
implementation to the constructor, see query.h. The
ExternalPostingSource implementation is reference counted, so when
its no
2009 Feb 12
1
problem when using xapian's static libs in windows
...::TermIterator::Internal::get_collection_freq(void)const " (?get_collection_freq at Internal@TermIterator at Xapian@@UBEIXZ)
libinmemory.lib(inmemory_database.obj) : error LNK2001: ????????? "public: virtual class Xapian::TermIterator::Internal * __thiscall Xapian::TermIterator::Internal::skip_to(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?skip_to at Internal@TermIterator at Xapian@@UAEPAV123 at ABV?$basic_string at DU?$char_traits at D@std@@V?$allocator at D@2@@std@@@Z)
libinmemory.lib(inmemory_alltermslist....
2020 Aug 23
2
MultiDatabase shard count limitations
...libc-2.28.so [.] __memcpy_ssse3
0.11% script/public-i libxapian.so.30.8.0 [.] MergePostList::recalc_maxweight
0.11% script/public-i libxapian.so.30.8.0 [.] GlassPostList::move_to_chunk_containing
0.11% script/public-i libxapian.so.30.8.0 [.] Glass::ValueChunkReader::skip_to
0.11% perl perl [.] S_maybe_multiconcat
0.10% script/public-i libxapian.so.30.8.0 [.] LocalSubMatch::open_post_list
0.10% perl libc-2.28.so [.] realloc
0.10% script/public-i libstdc++.so.6.0.25 [.] std::__cxx11::basic_strin...
2020 Aug 21
2
MultiDatabase shard count limitations
Going back to the "prioritizing aggregated DBs" thread from
February 2020, I've got 390 Xapian shards for 130 public inboxes
I want to search against(*). There's more on the horizon (we're
expecting tens of thousands of public inboxes).
After bumping RLIMIT_NOFILE and running ->add_database a bunch,
the actual queries seem to be taking ~30s (not good :x).
Now I'm