thr3ads.net - Ferret talk - [Ferret-talk] search speed eclipsed by retrieval speed [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Chris

2006-Jul-05 12:51 UTC

[Ferret-talk] search speed eclipsed by retrieval speed

Hi all,

I''ve recently started working with Ferret and I''m getting what
seems to
be slow searches. I have about 10000 documents in the index, with 
several fields per document, with some fields having an array of several 
values that are indexed.

I am using a RAMDirectory to store the index for searching. When doing 
testing, I find that searches are reasonable at around .2 to .5 seconds 
per search (for simple single word searches). However, when trying to 
retrieve the documents from the index, to retrieve the results ends up 
taking well over 2 to 3 seconds, totally eclipsing the search time, and 
making the whole thing quite slow. Am I missing anything here? Will 
reducing the document size greatly affect the retrieval time of the 
documents? Any suggestions for general speed improvement? Thanks!

Below, I have detailed te process I am using to create and search the 
index, in case that''s useful:

I have created an index that is stored on disk. I''d like to read it
back
into memory and use a RAMDirectory to see what speed improvements I can 
get by using that.

Here''s what I''m doing to create the index:

  ram_dir = Ferret::Store::RAMDirectory.new
  in_mem_index = Ferret::Index::IndexWriter.new(ram_dir, :create => 
true)

  # ... add stuff to the index

  in_mem_index.optimize
  in_mem_index.close

  index = Ferret::Index::Index.new(:dir => ram_dir)
  index.persist(''path/to/index'', true)
  index.close

I use a RAMDirectory when initially writing to the index because I am 
writing a lot to the index and I assume writing directly to a 
FSDirectory will be slower.

Later, I am trying to load this index back into memory as a 
RAMDirectory. I am not actually sure how to do this, so I am guessing 
here:

  ram_dir = Ferret::Store::RAMDirectory.new
  index = Ferret::Index::Index.new(:dir => ram_dir, :create => true)
 
index.add_indexes(Ferret::Store::FSDirectory.new(''path/to/index''))

  results = []
  num_results = index.search_each(''search word(s)'', {
:first_doc => 0,
:num_docs => 50 }) do | doc, score |
    results << index[doc]
  end


Any help would be awesome. Thanks!

- chris

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2006-Jul-06 03:23 UTC

head link

[Ferret-talk] search speed eclipsed by retrieval speed

On 7/5/06, Chris <chris.smoak at gmail.com> wrote:> Hi all,
>
> I''ve recently started working with Ferret and I''m getting
what seems to
> be slow searches. I have about 10000 documents in the index, with
> several fields per document, with some fields having an array of several
> values that are indexed.
>
> I am using a RAMDirectory to store the index for searching. When doing
> testing, I find that searches are reasonable at around .2 to .5 seconds
> per search (for simple single word searches). However, when trying to
> retrieve the documents from the index, to retrieve the results ends up
> taking well over 2 to 3 seconds, totally eclipsing the search time, and
> making the whole thing quite slow. Am I missing anything here? Will
> reducing the document size greatly affect the retrieval time of the
> documents? Any suggestions for general speed improvement? Thanks!
>
> Below, I have detailed te process I am using to create and search the
> index, in case that''s useful:
>
> I have created an index that is stored on disk. I''d like to read
it back
> into memory and use a RAMDirectory to see what speed improvements I can
> get by using that.
>
> Here''s what I''m doing to create the index:
>
>   ram_dir = Ferret::Store::RAMDirectory.new
>   in_mem_index = Ferret::Index::IndexWriter.new(ram_dir, :create =>
> true)
>
>   # ... add stuff to the index
>
>   in_mem_index.optimize
>   in_mem_index.close
>
>   index = Ferret::Index::Index.new(:dir => ram_dir)
>   index.persist(''path/to/index'', true)
>   index.close
Hi Chris,

This is currently the fastest way to create small indexes. In the next
version of Ferret it won''t make any difference though. Ferret will
automatically try and create as much of the index in Memory as
possible. It''s up to you to set the amount of memory that you want to
use to create the index. But forget about that for now. I''ll try and
answer your question.
> I use a RAMDirectory when initially writing to the index because I am
> writing a lot to the index and I assume writing directly to a
> FSDirectory will be slower.
Yes, but not by a lot.
> Later, I am trying to load this index back into memory as a
> RAMDirectory. I am not actually sure how to do this, so I am guessing
> here:
>
>   ram_dir = Ferret::Store::RAMDirectory.new
>   index = Ferret::Index::Index.new(:dir => ram_dir, :create => true)
>  
index.add_indexes(Ferret::Store::FSDirectory.new(''path/to/index''))
Better to do it like this;

    ram_dir =
Ferret::Store::RAMDirectory.new(FSDirectory.new("path/to/index"),
true)

That reads and FSDirectory directly into a RAMDirectory.
>   results = []
>   num_results = index.search_each(''search word(s)'', {
:first_doc => 0,
> :num_docs => 50 }) do | doc, score |
>     results << index[doc]
>   end
>
>
> Any help would be awesome. Thanks!
This all looks fine. It depends on your exact situation but if you are
indexing data from a database it is usually a better idea to only
store the id in the index. That way, when you load the document from
the index, you are only loading one short string. You can then get any
other data you need from the database. If your documents are large,
Ferret needs to read the whole document into memory. I''ve added a lazy
loading document to Ferret which will speed things up a lot in the
next version. It still seems very surprising to me that your queries
are taking so long. Are you working on Windows? That would explain
things a little.

Cheers,
Dave

Apparently Analagous Threads

Search for more maybe matching threads

Ferret talk - Jul 2006 - search speed eclipsed by retrieval speed

[Ferret-talk] search speed eclipsed by retrieval speed

[Ferret-talk] search speed eclipsed by retrieval speed

Apparently Analagous Threads