Displaying 20 results from an estimated 4000 matches similar to: "manual flushing thresholds for deletes?"
2023 Mar 26
1
manual flushing thresholds for deletes?
On Fri, Mar 24, 2023 at 10:37:41AM +0000, Eric Wong wrote:
> Realizing I had documents of hugely varying sizes (0.5KB..20MB)
> and little RAM, I instead tracked the number of raw bytes in the
> text being indexed and flushed whenever I'd seen a configurable
> byte count. Not the most scientific way, but it seems to work
> well enough on low-end systems.
>
> Now, I'm
2023 Mar 27
1
manual flushing thresholds for deletes?
Olly Betts <olly at survex.com> wrote:
> On Fri, Mar 24, 2023 at 10:37:41AM +0000, Eric Wong wrote:
> > Realizing I had documents of hugely varying sizes (0.5KB..20MB)
> > and little RAM, I instead tracked the number of raw bytes in the
> > text being indexed and flushed whenever I'd seen a configurable
> > byte count. Not the most scientific way, but it seems
2023 May 03
1
manual flushing thresholds for deletes?
On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote:
> Olly Betts <olly at survex.com> wrote:
> > This will also effectively ignore boolean terms, assuming you're giving
> > them wdf of 0 (because $3 here is the collection frequency, which is
> > sum(wdf(term)) over all documents).
>
> Should boolean terms be ignored when estimating flushing
>
2023 May 03
1
manual flushing thresholds for deletes?
Olly Betts <olly at survex.com> wrote:
> On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote:
> > Olly Betts <olly at survex.com> wrote:
> > > 10 seems too long. You want the mean word length weighted by frequency
> > > of occurrence. For English that's typically around 5 characters, which
> > > is 5 bytes. If we go for +1 that's:
2023 Mar 27
1
manual flushing thresholds for deletes?
On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote:
> Olly Betts <olly at survex.com> wrote:
> > 10 seems too long. You want the mean word length weighted by frequency
> > of occurrence. For English that's typically around 5 characters, which
> > is 5 bytes. If we go for +1 that's:
>
> Actually, 10 may be too short in my case since there's a
2023 Aug 27
1
DatabaseModifiedError while iterating on mset
On Wed, Aug 23, 2023 at 01:53:27PM +0000, Eric Wong wrote:
> I'm already retrying the ->get_mset operations; but now I'm
> wondering where I'd hit DatabaseModifiedErrors while inside a
> Xapian::MSetIterator loop.
>
> I assume ->get_document is a place where it gets thrown;
> but once a document is retrieved, can iterating through
> terms in one document
2015 Jul 26
1
Get term from document by position
> Snippet highlighting is something that was worked on for a GSoC project a
> few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>.
> It?s not available in the 1.2 series, but as I understand it should work out of the
> box in 1.3.3.
I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too
2007 Mar 04
5
Getting non-stemmed terms from IndexReader
I need to get a set of terms being indexed using Ferret. I used
IndexReader.terms and it returns a list of TermEnum nicely. The only
problem is that my analyzer includes a stemming filter.
So now, the terms I''m getting back are all stemmed. Is there anyway to
get the original unstemmed terms back from the index somehow? Thanks.
--
Posted via http://www.ruby-forum.com/.
2008 Mar 27
2
Proper noun stemming
Hi All
I was wondering if anyone had a solution for the following problem.
I user QueryParser to stem my documents before adding them to a
database. During the stemming process I would like to find a way of
keeping proper nouns that span two or more words together as a phrase.
For example "New York" or "Gordon Brown" or "Prime Minister" get spilt
up. I see
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello,
I have finished moving the API to PIMPL classes and will fix issues within
the current code over the next week, based on reviews from mentors.
The next step going forward is to start with forming document vectors that
are reduced and more useful. This majorly helps in saving run time (since
time for distance calculation depends on number of terms). Getting the
useful terms within a
2013 Jan 17
1
FASTER Search
I am suffering for slow searching performance on Xapian.
I am using Xapian for indexing about 150,000,000 documents.
It was implemented in C++;
The performance of searching was not that fast.
e.g. Searching a query, which includes about 20 terms, needs 2 secs avg.
For searching, I followed such steps:
1. construct a QueryParser for certain string
2. parse the query to get a Xapian::Query
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was
wondering if anyone had any good rules of thumb on how to pick which
words to blacklist. It all seems a little... well... vague. Although I
guess it kind of depends on the sort of documents you're wanting to index.
My current idea is to write a little script to output the terms with the
highest frequency in my
2006 Aug 11
3
Proposed changes to omindex
Proposed changes to omindex
Currently Available Items
=========================
1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during
indexing.
2) Add the document?s last modified time to the value table (ID 0). This would allow incremental
indexing based on the timestamp and also sorting by date in omega (SORT=0)
a. Currently I store the timestamp
2023 Aug 23
1
DatabaseModifiedError while iterating on mset
I'm already retrying the ->get_mset operations; but now I'm
wondering where I'd hit DatabaseModifiedErrors while inside a
Xapian::MSetIterator loop.
I assume ->get_document is a place where it gets thrown;
but once a document is retrieved, can iterating through
terms in one document (using TermIterator) also throw DB modified?
I'm dumping multiple terms per-document to a
2010 Jan 16
1
PHP XapianTermIterator/XapianPositionIterator usage
Hello again,
/thanks to Peter for previous response.
I've been digging around trying to find sample usage of
XapianTermIterator/XapianPositionIterator in PHP. The idea is to code up a
test case in PHP to perform snippet extraction (with a possible view to
coding a pecl extension in C). I found a C++ sample, but that wasn't much
help.
I must be dense this morning though, since I
2007 Jun 28
1
TermGenerator and SimpleStopper
Hi,
I'm using SimpleStopper with TermGenerator in a Python indexing
script, in an attempt to keep my index size down (currently 30K per
doc, and I have 200 million docs to index, which I think implies
6TB.) However, unprefixed (positional?) terms are not affected by
the stopper, though Z-prefixed terms are.
I assume this is intentional for phrase queries, but I need to reduce
my
2015 Jul 23
1
Get term from document by position
Hello. Is there any FAST way to get a term from the xapian document by it's position, something like
std::string term = Xapian::Document::GetTermByPosition(int position) ?
Below i have described a task that i am trying to solve, in case if somebody is interested.
============================================================================
When displaying search results, i would like to
2011 May 27
1
Does OP_NEAR works with stemming?
Hi All,
I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether
2023 Aug 28
1
DatabaseModifiedError while iterating on mset
Olly Betts <olly at survex.com> wrote:
> On Wed, Aug 23, 2023 at 01:53:27PM +0000, Eric Wong wrote:
> > I'm already retrying the ->get_mset operations; but now I'm
> > wondering where I'd hit DatabaseModifiedErrors while inside a
> > Xapian::MSetIterator loop.
> >
> > I assume ->get_document is a place where it gets thrown;
> > but
2017 Jul 21
0
[RFC PATCH 12/13] clk: parse thermal policies for throttling thresholds
Signed-off-by: Karol Herbst <karolherbst at gmail.com>
---
drm/nouveau/include/nvkm/subdev/clk.h | 2 ++
drm/nouveau/nvkm/subdev/clk/base.c | 42 +++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/drm/nouveau/include/nvkm/subdev/clk.h b/drm/nouveau/include/nvkm/subdev/clk.h
index f35518c3..f5ff1fd9 100644
--- a/drm/nouveau/include/nvkm/subdev/clk.h
+++