thr3ads.net - Ferret talk - [Ferret-talk] Help with Multiple Readers, 1 Writer scenario [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Neville Burnell

2006-Aug-28 07:26 UTC

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Hi,

I''m building a web server application using Ferret [thanks so much
Dave], Mongrel and Camping which works fine servicing one request at a
time, but serialises searches if more than one request arrives, so I''d
like some advice please about the best way to use multiple readers and
one writer.

Some background ... query requests which in my case are always read
only, arrive via Mongrel, which allocates a thread for each request.
Should I create a new IndexReader for each request also, or can I use
one IndexReader concurrently?

Index updates on the other hand are coordinated by a special Update
Thread which runs every 10 minutes or so. I''m guessing that the best
approach is to create an IndexWriter for each update run, which can be
closed and discarded at the end of the update run. Or can I close and
reuse a single IndexWriter?

I searched http://ferret.davebalmain.com/api for details on the
MultiReader, but I couldn''t find any details. If someone could post a
link to point me in the right direction that would be great.

Thanks so much

Neville

David Balmain

2006-Sep-01 10:18 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 8/28/06, Neville Burnell <Neville.Burnell at bmsoft.com.au>
wrote:> Hi,
>
> I''m building a web server application using Ferret [thanks so much
> Dave], Mongrel and Camping which works fine servicing one request at a
> time, but serialises searches if more than one request arrives, so
I''d
> like some advice please about the best way to use multiple readers and
> one writer.
>
> Some background ... query requests which in my case are always read
> only, arrive via Mongrel, which allocates a thread for each request.
> Should I create a new IndexReader for each request also, or can I use
> one IndexReader concurrently?
Creating a new reader per request is not a good idea since creating a
new IndexReader is an expensive operation (although it has been
significantly improved in version 0.10). A lot of data needs to be
read into memory for fast access. In most situations the ideal
solution is to have a single IndexReader per thread. You can have as
many IndexReaders open on an index as your operating system will
allow.

The one situation where you might be better off using a single
IndexReader is when you are relying on caching. Filters and Sorts are
cached per IndexReader and Sorts in particular can take up a fair
chunk of memory so if you have a large index (large as in number of
documents, not size of data) then you may be better off with a single
IndexReader. IndexReader is thread-safe so using it concurrently
should be fine.
> Index updates on the other hand are coordinated by a special Update
> Thread which runs every 10 minutes or so. I''m guessing that the
best
> approach is to create an IndexWriter for each update run, which can be
> closed and discarded at the end of the update run. Or can I close and
> reuse a single IndexWriter?
You can''t reuse an IndexWriter after it has been closed. But you can
commit the changes to disk;

    writer.commit()

IndexWriter#optimize will also commit all changes to disk as an
optimal index but depending on the size of your index you may only
want to call optimize once a day if at all. For a small index however,
calling it every ten minutes is definitely possible.
> I searched http://ferret.davebalmain.com/api for details on the
> MultiReader, but I couldn''t find any details. If someone could
post a
> link to point me in the right direction that would be great.
You can actually pass an array of readers as the first (only)
parameter to IndexReader.new.

    reader = IndexReader.new([reader1, reader2, reader3])

In the current working version of Ferret you can also pass Directory
objects or paths;

    iw = IndexReader.new([dir, dir2, dir3])

    iw = IndexReader.new(["/path/to/index1",
"/path/to/index2"])

wait for 10.2 for this functionality (and an update to include this
info in the API docs).

Cheers,
Dave

Neville Burnell

2006-Sep-04 01:40 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Thanks for your reply Dave,
> The one situation where you might be better off using
> a single IndexReader is when you are relying on caching. 
> Filters and Sorts are cached per IndexReader and Sorts 
> in particular can take up a fair chunk of memory so if 
> you have a large index (large as in number of documents, 
> not size of data) then you may be better off with a single 
> IndexReader. IndexReader is thread-safe so using it concurrently
> should be fine.
Just to clarify, I''m using Ferret::Index::Index concurrently at the
moment, and I''m not getting concurrent searches via #search_each. IE,
if
a slow wild-card search arrives first, all subsequent searches wait
until the wild-card search completes. 

So I guess #search_each is "synchronised"?

Therefore to have multiple searches on an index concurrently, I really
need an IndexReader per thread and I would need to manage a pool of
reusable IndexReaders?

Any pointers on how other web apps [not using Rails] handle multiple
Ferret readers?
> You can actually pass an array of readers as the first (only)
parameter to> IndexReader.new.
>
>    reader = IndexReader.new([reader1, reader2, reader3])
>
Interesting ... I had a look, but I don''t really understand what this
does? Would you elaborate please :D

Thanks for your help,

Neville

David Balmain

2006-Sep-04 04:05 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 9/4/06, Neville Burnell <Neville.Burnell at bmsoft.com.au>
wrote:> Thanks for your reply Dave,
>
> > The one situation where you might be better off using
> > a single IndexReader is when you are relying on caching.
> > Filters and Sorts are cached per IndexReader and Sorts
> > in particular can take up a fair chunk of memory so if
> > you have a large index (large as in number of documents,
> > not size of data) then you may be better off with a single
> > IndexReader. IndexReader is thread-safe so using it concurrently
> > should be fine.
>
> Just to clarify, I''m using Ferret::Index::Index concurrently at
the
> moment, and I''m not getting concurrent searches via #search_each.
IE, if
> a slow wild-card search arrives first, all subsequent searches wait
> until the wild-card search completes.
>
> So I guess #search_each is "synchronised"?
That''s correct. Otherwise it would be possible for the document IDs of
the documents to change between the time the search is run and the
time the document is referenced. For the benefit of those who don''t
know this, document IDs are not constant. They represent the position
of the document in the index. Think of it like an array. Let''s add 5
documents to the index.

    [0,1,2,3,4]

Now let''s delete documents 1 and 2;

    [0,3,4]

So document 4 now has a doc_id of 2. If this happened in the middle of
a search you''d have a problem. So instead we synchronize the the
Index#search and Index#search_each methods. Now this isn''t the case
for Searcher#search and Searcher#search_each since the IndexReader
that Searcher uses remains consistent so you should be able to use
Searcher concurrently.
> Therefore to have multiple searches on an index concurrently, I really
> need an IndexReader per thread and I would need to manage a pool of
> reusable IndexReaders?
Using Ferret::Index::Index this would be true. But if performance is a
concern you should definitely use a Ferret::Search::Searcher object
instead anyway and you''ll be able to use it concurrently.
> Any pointers on how other web apps [not using Rails] handle multiple
> Ferret readers?
Let us know if using the Searcher object isn''t adequate.
> > You can actually pass an array of readers as the first (only)
> parameter to
> > IndexReader.new.
> >
> >    reader = IndexReader.new([reader1, reader2, reader3])
> >
>
> Interesting ... I had a look, but I don''t really understand what
this
> does? Would you elaborate please :D
A MultiReader object was initially what was used to read and search
multiple indexes at a time. This functionality is now simply handled
by the IndexReader object. There are several uses for this. One was to
store each model in a separate index and you could then offer search
across multiple models using a MultiReader. Another use-case might be
to have multiple indexes to speed up indexing. If for example you are
scraping websites it is a very good idea to have multiple scraping
processes. The best way to do this is to have each process indexing to
its own index. You could then search all indexes at once using a
MultiReader or you could also merge all indexes into a single index.

Hope that makes sense.

Cheers,
Dave

Neville Burnell

2006-Sep-06 05:06 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

> Otherwise it would be possible for the document IDs of the 
> documents to change between the time the search is run and
> the time the document is referenced.
Well, I started coding to use Searcher#search_each and found myself
recoding most of the infrastructure of Index#search_each (and its
friends) simply to avoid its @dir.synchronize when what you were saying
above started to sink in. Ie, as I understand it, I can have concurrent
searchers if the index is read-only but not if I have a writer.

So while its possible to have multiple readers, 1 writer, the 1 writer
requirement forces use of synchronized, which means that the readers
must be serialised and not concurrent - is this correct?

Kind Regards

Neville



-----Original Message-----
From: ferret-talk-bounces at rubyforge.org
[mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain
Sent: Monday, 4 September 2006 2:05 PM
To: ferret-talk at rubyforge.org
Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 9/4/06, Neville Burnell <Neville.Burnell at bmsoft.com.au>
wrote:> Thanks for your reply Dave,
>
> > The one situation where you might be better off using a single 
> > IndexReader is when you are relying on caching.
> > Filters and Sorts are cached per IndexReader and Sorts in particular
> > can take up a fair chunk of memory so if you have a large index 
> > (large as in number of documents, not size of data) then you may be 
> > better off with a single IndexReader. IndexReader is thread-safe so 
> > using it concurrently should be fine.
>
> Just to clarify, I''m using Ferret::Index::Index concurrently at
the
> moment, and I''m not getting concurrent searches via #search_each.
IE,
> if a slow wild-card search arrives first, all subsequent searches wait
> until the wild-card search completes.
>
> So I guess #search_each is "synchronised"?
That''s correct. Otherwise it would be possible for the document IDs of
the documents to change between the time the search is run and the time
the document is referenced. For the benefit of those who don''t know
this, document IDs are not constant. They represent the position of the
document in the index. Think of it like an array. Let''s add 5 documents
to the index.

    [0,1,2,3,4]

Now let''s delete documents 1 and 2;

    [0,3,4]

So document 4 now has a doc_id of 2. If this happened in the middle of a
search you''d have a problem. So instead we synchronize the the
Index#search and Index#search_each methods. Now this isn''t the case for
Searcher#search and Searcher#search_each since the IndexReader that
Searcher uses remains consistent so you should be able to use Searcher
concurrently.
> Therefore to have multiple searches on an index concurrently, I really
> need an IndexReader per thread and I would need to manage a pool of 
> reusable IndexReaders?
Using Ferret::Index::Index this would be true. But if performance is a
concern you should definitely use a Ferret::Search::Searcher object
instead anyway and you''ll be able to use it concurrently.
> Any pointers on how other web apps [not using Rails] handle multiple 
> Ferret readers?
Let us know if using the Searcher object isn''t adequate.
> > You can actually pass an array of readers as the first (only)
> parameter to
> > IndexReader.new.
> >
> >    reader = IndexReader.new([reader1, reader2, reader3])
> >
>
> Interesting ... I had a look, but I don''t really understand what
this
> does? Would you elaborate please :D
A MultiReader object was initially what was used to read and search
multiple indexes at a time. This functionality is now simply handled by
the IndexReader object. There are several uses for this. One was to
store each model in a separate index and you could then offer search
across multiple models using a MultiReader. Another use-case might be to
have multiple indexes to speed up indexing. If for example you are
scraping websites it is a very good idea to have multiple scraping
processes. The best way to do this is to have each process indexing to
its own index. You could then search all indexes at once using a
MultiReader or you could also merge all indexes into a single index.

Hope that makes sense.

Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
Ferret-talk at rubyforge.org
http://rubyforge.org/mailman/listinfo/ferret-talk

Neville Burnell

2006-Sep-06 06:28 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

I''ve whipped up this script to demonstrate what I''m trying
[and failing]
to achieve. The idea is that thread t1 adds docs to the index over time,
while threads t2 and t3 search the same index for the new docs.
Unfortunately the script doesn''t work, as t2 and t3 don''t find
the docs
that t1 has added.

Can anyone point out where I am going wrong. Thanks so much.

Neville
================================
require ''rubygems''
require ''ferret''

p Ferret::VERSION

@dir = Ferret::Store::RAMDirectory.new
@writer = Ferret::Index::IndexWriter.new(:dir => @dir)
@searcher = Ferret::Search::Searcher.new(@dir) 
@parser = Ferret::QueryParser.new

@docs = []
@docs << {:id =>  1, :name => ''Fred'',      
:occupation => ''Toon''}
@docs << {:id =>  2, :name => ''Barney'',    
:occupation => ''Toon''}
@docs << {:id =>  3, :name => ''Wilma'',     
:occupation => ''Toon''}
@docs << {:id =>  4, :name => ''Betty'',     
:occupation => ''Toon''}
@docs << {:id =>  5, :name => ''Pebbles'',   
:occupation => ''Toon''}

@docs << {:id =>  6, :name => ''Superman'',  
:occupation => ''Hero''}
@docs << {:id =>  7, :name => ''Batman'',    
:occupation => ''Hero''}
@docs << {:id =>  8, :name => ''Spiderman'', 
:occupation => ''Hero''}
@docs << {:id =>  9, :name => ''Green Lantern'',
:occupation => ''Hero''}
@docs << {:id => 10, :name => ''Dr Strange'',
:occupation => ''Hero''}

@docs << {:id => 11, :name => ''Phantom'',   
:occupation => ''Hero''}

#populate index over time
t1 = Thread.new do  
  @docs.each do |doc|    
    p "t1: adding #{doc[:id]} to index"
    @writer << doc    
   
    sleep(10)
  end
end   

#search for heroes over time
t2 = Thread.new do  
  query_txt = ''occupation:hero''
  query = @parser.parse(query_txt)
  while true do
    hits = @searcher.search(query)
    p "t2: searching for #{query_txt} found #{hits.total_hits}"    
    return if hits.total_hits == 6
    
    sleep(5)
  end
end

#search for toons over time
t3 = Thread.new do  
  query_txt = ''occupation:toon''
  query = @parser.parse(query_txt)
  while true do
    hits = @searcher.search(query)
    p "t3: searching for #{query_txt} found #{hits.total_hits}"    
    return if hits.total_hits == 5
    
    sleep(5)
  end
end   
   
t1.join; t2.join; t3.join

David Balmain

2006-Sep-06 06:40 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 9/6/06, Neville Burnell <Neville.Burnell at bmsoft.com.au>
wrote:> > Otherwise it would be possible for the document IDs of the
> > documents to change between the time the search is run and
> > the time the document is referenced.
>
> Well, I started coding to use Searcher#search_each and found myself
> recoding most of the infrastructure of Index#search_each (and its
> friends) simply to avoid its @dir.synchronize when what you were saying
> above started to sink in. Ie, as I understand it, I can have concurrent
> searchers if the index is read-only but not if I have a writer.
>
> So while its possible to have multiple readers, 1 writer, the 1 writer
> requirement forces use of synchronized, which means that the readers
> must be serialised and not concurrent - is this correct?
Close, When you open an IndexReader on the index it is opened up on
that particular version (or state) of the index. So any operations on
the IndexReader (like searches) will only show what was in the index
at the time you opened it. Any modifications to the index (usually
through and IndexWriter) that occur after you open the IndexReader
will not appear in your searches. So to keep searches up to date you
need to close and reopen your IndexReader every time you commit
changes to the index.

So the writer doesn''t force the use of synchronized. Rather it forces
you to decide whether searches need to return the most up to date
results available or if there can be a short delay between changes
being written to the index and changes appearing in the search
results. The Index class makes it as simple as possible to always
search the latest index but there is a performance hit. Most of the
time performance should be fine. The Ferret C core has been highly
optimized and will still beat most other solutions hands down, even
when used in this way.

Now, if I were writing an application where search performance is a
big issue (as it seems to be in your case) then I would start by using
the base classes like IndexReader and IndexWriter (as we''ve already
discussed). Like I just mentioned you might allow a delay between the
time the index is modified and the time those modifications appear in
search results. This would allow you to update the IndexReader every
minute/hour/day/week without regard to what the IndexWriter is doing.
This solution works well when when scraping webpages. Google''s
results, for example, aren''t always completely up to date with the
pages they index. If one of their results is a dead link it isn''t the
end of the world.

If, however, you are indexing data in a database it often isn''t this
simple. If you use the previous solution with a database that allows
deletes then you need some way to handle results that reference
objects that have been deleted from the database. Otherwise you will
need some way to synchronize on the index (probably on the
Ferret::Store::Directory like Ferret::Index::Index does) so that no
searches are done while the deletion is committed to the index and the
IndexReaders are updated.

Another solution which I''m going to experiment with is using the index
as your database. You may still keep your original database but store
any data in the index that will be shown back to the user as the
result of a search. That way you don''t need to worry about
synchronization with the database.

I don''t think I''ve explained this very clearly here so feel
free to
try and clarify. I will be endeavoring to write this all down more
clear and comprehensible manner so that everyone can work out the
solution that best fits their needs.

Cheers,
Dave

PS: The ideal solution for me would be an object database with
Ferret-like full-text search built in. I''ve been thinking about this a
lot lately. It would certainly fit the style of development used in
many Rails apps. That is to say, all access to the database must go
through the model as that is where all the validation is. If you are
developing this way, why bother with the relational database and ORM
solution. A good object database would serve the same purpose and
would be a LOT more performant. Obviously this solution wouldn''t be
for everybody though so enterprise developers feel free to ignore. ;-)

David Balmain

2006-Sep-06 06:43 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 9/6/06, Neville Burnell <Neville.Burnell at bmsoft.com.au>
wrote:> I''ve whipped up this script to demonstrate what I''m
trying [and failing]
> to achieve. The idea is that thread t1 adds docs to the index over time,
> while threads t2 and t3 search the same index for the new docs.
> Unfortunately the script doesn''t work, as t2 and t3 don''t
find the docs
> that t1 has added.
>
> Can anyone point out where I am going wrong. Thanks so much.
Please let me know if the first paragraph of my previous email doesn''t
explain this.

Cheers,
Dave

Neville Burnell

2006-Sep-06 07:07 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Thanks Dave, I think I understand now ... 

FWIW, the following script works now I have read your responses. I''ve
posted it here for others to read.

=================
require ''rubygems''
require ''ferret''

p Ferret::VERSION

@dir = Ferret::Store::RAMDirectory.new
@writer = Ferret::Index::IndexWriter.new(:dir => @dir)
@searcher = Ferret::Search::Searcher.new(@dir) 
@parser = Ferret::QueryParser.new

@docs = []
@docs << {:id =>  1, :name => ''Fred'',      
:occupation => ''Toon''}
@docs << {:id =>  2, :name => ''Barney'',    
:occupation => ''Toon''}
@docs << {:id =>  3, :name => ''Wilma'',     
:occupation => ''Toon''}
@docs << {:id =>  4, :name => ''Betty'',     
:occupation => ''Toon''}
@docs << {:id =>  5, :name => ''Pebbles'',   
:occupation => ''Toon''}

@docs << {:id =>  6, :name => ''Superman'',  
:occupation => ''Hero''}
@docs << {:id =>  7, :name => ''Batman'',    
:occupation => ''Hero''}
@docs << {:id =>  8, :name => ''Spiderman'', 
:occupation => ''Hero''}
@docs << {:id =>  9, :name => ''Green Lantern'',
:occupation => ''Hero''}
@docs << {:id => 10, :name => ''Dr Strange'',
:occupation => ''Hero''}

@docs << {:id => 11, :name => ''Phantom'',   
:occupation => ''Hero''}

#populate index over time
t1 = Thread.new do  
  @docs.each do |doc|    
    p "t1: adding #{doc[:id]} to index"
    @writer << doc       
    sleep(10)
  end
end   

#search for heroes over time
t2 = Thread.new do  
  query_txt = ''occupation:hero''
  query = @parser.parse(query_txt)
  while true do
    hits = @searcher.search(query)
    p "t2: searching for #{query_txt} found #{hits.total_hits}"    
    return if hits.total_hits == 6
    
    sleep(5)
  end
end

#search for toons over time
t3 = Thread.new do  
  query_txt = ''occupation:toon''
  query = @parser.parse(query_txt)
  while true do
    hits = @searcher.search(query)
    p "t3: searching for #{query_txt} found #{hits.total_hits}"    
    return if hits.total_hits == 5
    
    sleep(5)
  end
end   
   
t1.join; t2.join; t3.join

-----Original Message-----
From: ferret-talk-bounces at rubyforge.org
[mailto:ferret-talk-bounces at rubyforge.org] On Behalf Of David Balmain
Sent: Wednesday, 6 September 2006 4:43 PM
To: ferret-talk at rubyforge.org
Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 9/6/06, Neville Burnell <Neville.Burnell at bmsoft.com.au>
wrote:> I''ve whipped up this script to demonstrate what I''m
trying [and
> failing] to achieve. The idea is that thread t1 adds docs to the index
> over time, while threads t2 and t3 search the same index for the new
docs.> Unfortunately the script doesn''t work, as t2 and t3 don''t
find the
> docs that t1 has added.
>
> Can anyone point out where I am going wrong. Thanks so much.
Please let me know if the first paragraph of my previous email doesn''t
explain this.

Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
Ferret-talk at rubyforge.org
http://rubyforge.org/mailman/listinfo/ferret-talk

Neville Burnell

2006-Sep-06 07:16 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Oops ... My cut & paste buffer was old!

The key difference between this script and the old script is that the
writer thread, t1, replaces the searcher after each index update, and
each reader thread, t2 and t3, grab a new copy of the searcher, which
they use for the duration of a search.

So the old searchers are GC''d when no longer required.


==================
require ''rubygems''
require ''ferret''

p Ferret::VERSION

@dir = Ferret::Store::RAMDirectory.new
@writer = Ferret::Index::IndexWriter.new(:dir => @dir)
@searcher = Ferret::Search::Searcher.new(@dir) 
@parser = Ferret::QueryParser.new

@docs = []
@docs << {:id =>  1, :name => ''Fred'',      
:occupation => ''Toon''}
@docs << {:id =>  2, :name => ''Barney'',    
:occupation => ''Toon''}
@docs << {:id =>  3, :name => ''Wilma'',     
:occupation => ''Toon''}
@docs << {:id =>  4, :name => ''Betty'',     
:occupation => ''Toon''}
@docs << {:id =>  5, :name => ''Pebbles'',   
:occupation => ''Toon''}

@docs << {:id =>  6, :name => ''Superman'',  
:occupation => ''Hero''}
@docs << {:id =>  7, :name => ''Batman'',    
:occupation => ''Hero''}
@docs << {:id =>  8, :name => ''Spiderman'', 
:occupation => ''Hero''}
@docs << {:id =>  9, :name => ''Green Lantern'',
:occupation => ''Hero''}
@docs << {:id => 10, :name => ''Dr Strange'',
:occupation => ''Hero''}

@docs << {:id => 11, :name => ''Phantom'',   
:occupation => ''Hero''}

#@docs.each {|doc| @writer << doc}
#@writer.commit
#@searcher = Ferret::Search::Searcher.new(@dir) 

#populate index over time
t1 = Thread.new do  
  @docs.each do |doc|    
    p "t1: adding #{doc[:id]} to index"
    @writer << doc       
    @writer.commit
    
    #new searcher   
    @searcher = Ferret::Search::Searcher.new(@dir) 
    sleep(10)
  end
end   

#search for heroes over time
t2 = Thread.new do  
  query_txt = ''occupation:hero''
  query = @parser.parse(query_txt)
  while true do
    mysearcher = @searcher
    hits = mysearcher.search(query)
    p "t2: searching for #{query_txt} found #{hits.total_hits}"    
    break if hits.total_hits == 6
    
    sleep(5)
  end
end

#search for toons over time
t3 = Thread.new do  
  query_txt = ''occupation:toon''
  query = @parser.parse(query_txt)
  while true do
    mysearcher = @searcher
    hits = mysearcher.search(query)
    p "t3: searching for #{query_txt} found #{hits.total_hits}"    
    break if hits.total_hits == 5
    
    sleep(5)
  end
end   
   
t1.join; t2.join; t3.join

Neville Burnell

2006-Sep-07 03:56 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Thanks for your email Dave,

I''ve thought about this overnight, and I''ve got a few
questions please.
> When you open an IndexReader on the index it is opened up on 
> that particular version (or state) of the index
Would you elaborate on how Ferret manages versions please. For example,
can I have two readers open, one which accesses the old version of the
index, and the second which accesses the latest version?
> So to keep searches up to date you need to close and reopen 
> your IndexReader every time you commit changes to the index.
I guess by reopen you mean IndexReader.new ?

I proceeded to replace my Index usage with an IndexReader and Searcher
which are closed and recreated after each IndexWriter pass, and the
result seems to be that searches are still serialised - ie, a long
running query on thread t1 "blocks" the normally very fast query on
thread t1.

Might I be seeing another point of synchonisation, or am I just
observing a characteristic of ruby threads ?

Kind Regards,

Neville

David Balmain

2006-Sep-07 06:07 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 9/7/06, Neville Burnell <Neville.Burnell at bmsoft.com.au>
wrote:> Thanks for your email Dave,
>
> I''ve thought about this overnight, and I''ve got a few
questions please.
>
> > When you open an IndexReader on the index it is opened up on
> > that particular version (or state) of the index
>
> Would you elaborate on how Ferret manages versions please. For example,
> can I have two readers open, one which accesses the old version of the
> index, and the second which accesses the latest version?
When  you open an IndexReader it opens all the files that it needs to
read the index and it keeps all of the file handles. Even after the
index is updated and those files are deleted they are not actually
freed by the operating system. If you then open an IndexReader on a
later version it holds file handles to all the files needed for that
version. So the answer is yes, you can have multiple IndexReaders open
on an index at the same time, all reading different versions. Each
version of the index has an internal version number and there is an
IndexReader#latest? method to determine if the version of the index
that you are reading is the current version.
> > So to keep searches up to date you need to close and reopen
> > your IndexReader every time you commit changes to the index.
>
> I guess by reopen you mean IndexReader.new ?
That''s correct. Don''t forget to close the old IndexReader.
That
garbage collector will do this for you but IndexReaders hold a lot of
resources so it''s best to close them as soon as you no longer need
them.
> I proceeded to replace my Index usage with an IndexReader and Searcher
> which are closed and recreated after each IndexWriter pass, and the
> result seems to be that searches are still serialised - ie, a long
> running query on thread t1 "blocks" the normally very fast query
on
> thread t1.
>
> Might I be seeing another point of synchonisation, or am I just
> observing a characteristic of ruby threads ?
I think it''s probably a symptom of using ruby threads. I don''t
think
they can swap threads in the middle of a call to a C function. It''s
unusual, however for a search to take long enough to be a problem
though. What kind of search is it? If it''s a PrefixQuery, FuzzyQuery
or WildCardQuery you''ll get much better performance on an optimized
index. If you are making heavy use of any of these queries it is the
one time I''d recommend always keeping the index in an optimized state.

cheers,
Dave

Neville Burnell

2006-Sep-10 23:40 UTC

head link

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

> It''s unusual, however for a search to take long 
> enough to be a problem though. What kind of search
> is it? 
Actually I''m misleading you. The searches are very fast, ie, 0.1 sec or
faster on my 30,000 doc index.

By "slow query" I really mean my "#search_each do" which
fetches each
doc from the index and appends it to an xml or html response. 

This is clearly not a Ferret issue I think.

Thanks for all your help Dave,

Regards

Neville

Apparently Analagous Threads

Search for more reasonably related threads

Ferret talk - Aug 2006 - Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

[Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Apparently Analagous Threads