thr3ads.net - similar to: "Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit"

Displaying 20 results from an estimated 900 matches similar to: "Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit"

Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit

2006 Sep 14

Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit

Hi David, > Deleted documents don''t get deleted until commit is called Ok, but FYI, my experiments show that #commit doesn''t affect #doc_count, even across ruby sessions. On a different note, I''d like to request a variation of #add_document which returns the doc_id of the document added, as opposed to self. I''m trying to track down an issue with a large

Possiible Bug ? indexWriter#doc_countcountsdeleted docs after #commit

2006 Sep 15

Possiible Bug ? indexWriter#doc_countcountsdeleted docs after #commit

> I should also mention the reason I wouldn''t want > to return the document ID from any IndexWriter method > is that the document ID could become invalid when the > next document is added (if a segment merge is triggered > and deletes exist). At least when using an IndexReader, > the document ID is valid for the life of the reader. Thanks for your detail Dave!

Help with Multiple Readers, 1 Writer scenario

2006 Aug 28

Help with Multiple Readers, 1 Writer scenario

Hi, I''m building a web server application using Ferret [thanks so much Dave], Mongrel and Camping which works fine servicing one request at a time, but serialises searches if more than one request arrives, so I''d like some advice please about the best way to use multiple readers and one writer. Some background ... query requests which in my case are always read only, arrive via

Error with :create => true and existing index

2006 Sep 22

Error with :create => true and existing index

I implemented a "reindex" command which simply creates an IndexWriter with :create => true for a prexisting index. The "reindexing" seems to start out ok, with several thousand docs added, then Ferret throws an exception: IO Error occured: couldn''t rename file "index\_0.tmp" to "index\_0.cfs": <File exists> I guess that _0.cfs is held

Help with Multiple Readers, 1 Writer scenario

2006 Nov 22

Help with Multiple Readers, 1 Writer scenario

Some time back in September, [sorry to be so slow], Dave wrote: > When you open an IndexReader on the index it is opened up on > that particular version (or state) of the index. So any > operations on the IndexReader (like searches) will only show > what was in the index at the time you opened it. Any modifications > to the index (usually through and IndexWriter) that occur

Determine how many documents a term occurs in

2007 Apr 28

Determine how many documents a term occurs in

Is there a fast way to determine how many documents a term occurs in, besides iterating through every document with TermDocEnum? -- Best regards, Stian Gryt?yr

In memory IndexReader bug?

2006 Jun 14

In memory IndexReader bug?

Hi All, Hope all is going well. I''m having trouble with the following code creating an in memory index reader - it seems to be attempting to read from a file regardless. Here''s the simple code: require ''rubygems'' require ''ferret'' a = Ferret::Index::Index.new r = Ferret::Index::IndexReader.new(nil) Running the code on my OS X machine

Ferret 0.11.4.win32 indexing speed vs Ferret 0.10.9.win32

2007 Apr 12

Ferret 0.11.4.win32 indexing speed vs Ferret 0.10.9.win32

Firstly, thanks Dave for all your hard work. Ferret Rocks!, I am just testing 0.11.4.win32 and it seems to work just fine, however the index creation phase of my app is perhaps 3x slower under 0.11.4 vs 0.10.9 Details follow: System: windows xp sp2, index on local hard disk, Ruby 1.8.6 Run #1, Ferret 0.10.9 - Reboot - Build index, 35,000 rows added in 297 seconds - Run #2, Ferret 0.11.4 -

Trouble with "updating" a document

2006 Sep 15

Trouble with "updating" a document

Hi, I seem to be having trouble updating a doc, ie, deleting then re-adding to the index. The following script demonstrates my issue - I''m sure I''m missing something obvious, but I can''t seem to find the problem. Can someone point out where I am going wrong please ? Regards Neville === require ''rubygems'' require ''ferret'' p

Index::Index.new vs. Readers and Writers

2006 May 08

Index::Index.new vs. Readers and Writers

Hey gang, A post on the Rails forum a while back had it sound like you pretty much had to use the Index Readers & Writers if you were going to be potentially accessing an index from more than one process. (i.e. multiple dispatch.fcgi''s, etc) Is this still the case, or does the main Index class do that black magic behind the scenes? =) I was having trouble implementing the

A few questions about numbers and dates

2006 Sep 28

A few questions about numbers and dates

Hi, I just noticed that Ferret seems to convert every field to a string [ruby code appended for those interested], which has thwarted my attempt to format Dates (to "dd/mm/yyyy") and Floats (to "n.nn") for consumption further down the line based on the class of the field stored. I considered pre-formatting Dates and Floats prior to indexing, which would store the field

Parallel indexing doesn''t work?

2008 Jan 09

Parallel indexing doesn''t work?

Hi, I''m trying to get parallelized ferret indexing working for my AAF indices, based on the example in the O''Reilly Ferret shortcut. However, the resulting indices after merging seem to have no actual documents. I went and made minimal changes to the example in the Ferret shortcut pdf, and indeed can''t get that to work either. I''d appreciate any help

search speed eclipsed by retrieval speed

2006 Jul 05

search speed eclipsed by retrieval speed

Hi all, I''ve recently started working with Ferret and I''m getting what seems to be slow searches. I have about 10000 documents in the index, with several fields per document, with some fields having an array of several values that are indexed. I am using a RAMDirectory to store the index for searching. When doing testing, I find that searches are reasonable at around .2 to

0.10.2 release with win32 gem

2006 Sep 04

0.10.2 release with win32 gem

Hey all, I''ve just released Ferret version 0.10.2. It is mostly just a bug fix release. The only change is that a highlight method has been added to Ferret::Index::Index. Please try it out and let me know what you think. The big news for this release is that there is also a binary win32 gem included. This is the first time I''ve build a gem like this so please let me know if

Proposal of some radical changes to API

2006 Jun 04

Proposal of some radical changes to API

Hey guys, Now that the Lucy[1] project has Apache approval and is about to begin, the onus is no longer on Ferret to strive for Lucene compatability. (We''ll be doing that in Lucy). So I''m starting to think about ways to improve Ferret''s API. The first part that needs to be improved is the Document API. It''s annoying having to type all the attributes to

lock problems from concurrent processes.

2005 Nov 17

lock problems from concurrent processes.

Hi! First, thanks a LOT for ferret. The API and documentation is great. I''m trying to integrate ferret into a RoR app (DamageControl) and have run into a problem with locks. DamageControl consists of two processes that start up and run in parallel. The first one is the webapp (which is just a plain RoR app). The second is a daemon process that runs in the background. The daemon process

Index.optimize

2006 Aug 03

Index.optimize

In the documentation, it says that optimize "should only be called when the index will no longer be updated very often, but will be read a lot". Does this mean it actually has a detrimental impact on updates and inserts? In my project there will be many more reads than updates, but there will still be a lot of updates. So should I be calling Optimize once a day or something like that,

Whats ''favicon.ico''

2005 Mar 03

Whats ''favicon.ico''

I''m seeing the following in the WEBbrick console output after every GET 192.168.0.108 - - [03/Mar/2005:15:35:19 AUS Eastern Daylight Time] "GET /favicon.ico HTTP/1.1" 200 60 - -> /favicon.ico What does /favicon.ico (which doesnt seem to exist in my source) do for Rails? _______________________________________________ Rails mailing list

Creating my own analyzer

2006 Apr 20

Creating my own analyzer

I created this analyzer: class DescriptionAnalyzer < Ferret::Analysis::Analyzer def token_stream(field, string) if field == "code" return CodeTokenStream.new(string) else return Ferret::Analysis::Analyzer.new.token_stream(field,string) end end end and created an IndexWriter with it: Ferret::Index::IndexWriter.new(get_index_path,

Ferret 0.10.2 - Index#search_each() and :num_docs

2006 Sep 05

Ferret 0.10.2 - Index#search_each() and :num_docs

Hi, I seem to be having trouble getting more than 10 hits from Index#search_each since upgrading to 0.10.2 (ie, this was working in 0.9.4). Maybe a bug, as the #search_each doesn''t seem to use the options parameter any more ? Thanks, Neville =========================================== require ''rubygems'' require ''ferret'' p Ferret::VERSION idx =

similar to: Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit