thr3ads.net - similar to: "manual flushing thresholds for deletes?"

Displaying 20 results from an estimated 4000 matches similar to: "manual flushing thresholds for deletes?"

2023 Mar 26

manual flushing thresholds for deletes?

On Fri, Mar 24, 2023 at 10:37:41AM +0000, Eric Wong wrote: > Realizing I had documents of hugely varying sizes (0.5KB..20MB) > and little RAM, I instead tracked the number of raw bytes in the > text being indexed and flushed whenever I'd seen a configurable > byte count. Not the most scientific way, but it seems to work > well enough on low-end systems. > > Now, I'm

manual flushing thresholds for deletes?

2023 Mar 27

manual flushing thresholds for deletes?

Olly Betts <olly at survex.com> wrote: > On Fri, Mar 24, 2023 at 10:37:41AM +0000, Eric Wong wrote: > > Realizing I had documents of hugely varying sizes (0.5KB..20MB) > > and little RAM, I instead tracked the number of raw bytes in the > > text being indexed and flushed whenever I'd seen a configurable > > byte count. Not the most scientific way, but it seems

manual flushing thresholds for deletes?

2023 May 03

manual flushing thresholds for deletes?

On Wed, May 03, 2023 at 12:38:15PM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > This will also effectively ignore boolean terms, assuming you're giving > > them wdf of 0 (because $3 here is the collection frequency, which is > > sum(wdf(term)) over all documents). > > Should boolean terms be ignored when estimating flushing >

manual flushing thresholds for deletes?

2023 May 03

manual flushing thresholds for deletes?

Olly Betts <olly at survex.com> wrote: > On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > 10 seems too long. You want the mean word length weighted by frequency > > > of occurrence. For English that's typically around 5 characters, which > > > is 5 bytes. If we go for +1 that's:

manual flushing thresholds for deletes?

2023 Mar 27

manual flushing thresholds for deletes?

On Mon, Mar 27, 2023 at 11:22:09AM +0000, Eric Wong wrote: > Olly Betts <olly at survex.com> wrote: > > 10 seems too long. You want the mean word length weighted by frequency > > of occurrence. For English that's typically around 5 characters, which > > is 5 bytes. If we go for +1 that's: > > Actually, 10 may be too short in my case since there's a

DatabaseModifiedError while iterating on mset

2023 Aug 27

DatabaseModifiedError while iterating on mset

On Wed, Aug 23, 2023 at 01:53:27PM +0000, Eric Wong wrote: > I'm already retrying the ->get_mset operations; but now I'm > wondering where I'd hit DatabaseModifiedErrors while inside a > Xapian::MSetIterator loop. > > I assume ->get_document is a place where it gets thrown; > but once a document is retrieved, can iterating through > terms in one document

Get term from document by position

2015 Jul 26

Get term from document by position

> Snippet highlighting is something that was worked on for a GSoC project a > few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. > It?s not available in the 1.2 series, but as I understand it should work out of the > box in 1.3.3. I tried it, this approach returns snippet that have nothing to do with the search string. Moreover, it takes too

Getting non-stemmed terms from IndexReader

2007 Mar 04

Getting non-stemmed terms from IndexReader

I need to get a set of terms being indexed using Ferret. I used IndexReader.terms and it returns a list of TermEnum nicely. The only problem is that my analyzer includes a stemming filter. So now, the terms I''m getting back are all stemmed. Is there anyway to get the original unstemmed terms back from the index somehow? Thanks. -- Posted via http://www.ruby-forum.com/.

Proper noun stemming

2008 Mar 27

Proper noun stemming

Hi All I was wondering if anyone had a solution for the following problem. I user QueryParser to stem my documents before adding them to a database. During the stemming process I would like to find a way of keeping proper nouns that span two or more words together as a phrase. For example "New York" or "Gordon Brown" or "Prime Minister" get spilt up. I see

KMeans Clusterer - Going forward

2017 Jun 14

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

FASTER Search

2013 Jan 17

FASTER Search

I am suffering for slow searching performance on Xapian. I am using Xapian for indexing about 150,000,000 documents. It was implemented in C++; The performance of searching was not that fast. e.g. Searching a query, which includes about 20 terms, needs 2 secs avg. For searching, I followed such steps: 1. construct a QueryParser for certain string 2. parse the query to get a Xapian::Query

ideas on picking stopwords

2009 Mar 26

ideas on picking stopwords

I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my

Proposed changes to omindex

2006 Aug 11

Proposed changes to omindex

Proposed changes to omindex Currently Available Items ========================= 1) Have the Q prefix contain the 16 byte MD5 of the full file name used for document lookup during indexing. 2) Add the document?s last modified time to the value table (ID 0). This would allow incremental indexing based on the timestamp and also sorting by date in omega (SORT=0) a. Currently I store the timestamp

DatabaseModifiedError while iterating on mset

2023 Aug 23

DatabaseModifiedError while iterating on mset

I'm already retrying the ->get_mset operations; but now I'm wondering where I'd hit DatabaseModifiedErrors while inside a Xapian::MSetIterator loop. I assume ->get_document is a place where it gets thrown; but once a document is retrieved, can iterating through terms in one document (using TermIterator) also throw DB modified? I'm dumping multiple terms per-document to a

PHP XapianTermIterator/XapianPositionIterator usage

2010 Jan 16

PHP XapianTermIterator/XapianPositionIterator usage

Hello again, /thanks to Peter for previous response. I've been digging around trying to find sample usage of XapianTermIterator/XapianPositionIterator in PHP. The idea is to code up a test case in PHP to perform snippet extraction (with a possible view to coding a pecl extension in C). I found a C++ sample, but that wasn't much help. I must be dense this morning though, since I

TermGenerator and SimpleStopper

2007 Jun 28

TermGenerator and SimpleStopper

Hi, I'm using SimpleStopper with TermGenerator in a Python indexing script, in an attempt to keep my index size down (currently 30K per doc, and I have 200 million docs to index, which I think implies 6TB.) However, unprefixed (positional?) terms are not affected by the stopper, though Z-prefixed terms are. I assume this is intentional for phrase queries, but I need to reduce my

Get term from document by position

2015 Jul 23

Get term from document by position

Hello. Is there any FAST way to get a term from the xapian document by it's position, something like std::string term = Xapian::Document::GetTermByPosition(int position) ? Below i have described a task that i am trying to solve, in case if somebody is interested. ============================================================================ When displaying search results, i would like to

Does OP_NEAR works with stemming?

2011 May 27

Does OP_NEAR works with stemming?

Hi All, I used the OP_NEAR operator for queryparser, and when I searched for "apple store" from my own collection, the query is parsed as "Zappl:(pos=1) NEAR 11 Zstore:(pos=2)" but retrieved nothing. However, if I type in "Apple Store", the query is parsed as Xapian::Query((apple:(pos=1) NEAR 11 store:(pos=2))) and some results are showed. I'm not sure whether

DatabaseModifiedError while iterating on mset

2023 Aug 28

DatabaseModifiedError while iterating on mset

Olly Betts <olly at survex.com> wrote: > On Wed, Aug 23, 2023 at 01:53:27PM +0000, Eric Wong wrote: > > I'm already retrying the ->get_mset operations; but now I'm > > wondering where I'd hit DatabaseModifiedErrors while inside a > > Xapian::MSetIterator loop. > > > > I assume ->get_document is a place where it gets thrown; > > but

[RFC PATCH 12/13] clk: parse thermal policies for throttling thresholds

2017 Jul 21

[RFC PATCH 12/13] clk: parse thermal policies for throttling thresholds

Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- drm/nouveau/include/nvkm/subdev/clk.h | 2 ++ drm/nouveau/nvkm/subdev/clk/base.c | 42 +++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/drm/nouveau/include/nvkm/subdev/clk.h b/drm/nouveau/include/nvkm/subdev/clk.h index f35518c3..f5ff1fd9 100644 --- a/drm/nouveau/include/nvkm/subdev/clk.h +++

similar to: manual flushing thresholds for deletes?