Neville Burnell
2007-Apr-12 01:06 UTC
[Ferret-talk] Ferret 0.11.4.win32 indexing speed vs Ferret 0.10.9.win32
Firstly, thanks Dave for all your hard work. Ferret Rocks!, I am just testing 0.11.4.win32 and it seems to work just fine, however the index creation phase of my app is perhaps 3x slower under 0.11.4 vs 0.10.9 Details follow: System: windows xp sp2, index on local hard disk, Ruby 1.8.6 Run #1, Ferret 0.10.9 - Reboot - Build index, 35,000 rows added in 297 seconds - Run #2, Ferret 0.11.4 - Reboot - Build index, 35,000 rows added in 1044 seconds Searching both indexes "feels" about the same Any comments on whether Ferret 0.11.4 should be much slower for bulk inserts ? Kind regards Neville
David Balmain
2007-Apr-12 11:03 UTC
[Ferret-talk] Ferret 0.11.4.win32 indexing speed vs Ferret 0.10.9.win32
On 4/12/07, Neville Burnell <Neville.Burnell at bmsoft.com.au> wrote:> Firstly, thanks Dave for all your hard work. Ferret Rocks!, > > I am just testing 0.11.4.win32 and it seems to work just fine, however > the index creation phase of my app is perhaps 3x slower under 0.11.4 vs > 0.10.9 > > Details follow: > > System: windows xp sp2, index on local hard disk, Ruby 1.8.6 > > Run #1, Ferret 0.10.9 > - Reboot > - Build index, 35,000 rows added in 297 seconds > - > Run #2, Ferret 0.11.4 > - Reboot > - Build index, 35,000 rows added in 1044 secondsOuch, that sucks. There is a difference in indexing speed on Linux too depending a lot on the parameters you use but bulk indexing is largely unchanged. The differences are due to the changes I''ve made to make Ferret more stable when indexing and adding the ability to Ferret to recover when the index is corrupted. This makes Ferret much slower when opening an index but the indexing procedure hasn''t changed. I haven''t really looked at the performance in Windows. A few questions here might allow me to fix this problem. Are you using the Index class or the IndexWriter class? What parameters are you passing to the indexer? I''ll see what I can do but I can''t promise anything.> Searching both indexes "feels" about the sameSearching should be the same, although opening the index for searching will be slower. But this shouldn''t be done for every search so it shouldn''t be a problem.> Any comments on whether Ferret 0.11.4 should be much slower for bulk > inserts ?I guess I already answered this. No, it shouldn''t be slower for bulk updates. Actually, looking at your times, it seems like you may not have the optimal settings for indexing as even 297 seconds seems like a long time to index 35,000 documents although it depends on the documents and where they are coming from. If you give me a little more information I may be able to help you speed this up. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/
Neville Burnell
2007-Apr-13 00:32 UTC
[Ferret-talk] Ferret 0.11.4.win32 indexing speed vs Ferret0.10.9.win32
> I haven''t really looked at the performance in Windows. A few questions > here might allow me to fix this problem. Are you using the Index class > or the IndexWriter class? What parameters are you passing to the > indexer? I''ll see what I can do but I can''t promise anything.I''m using IndexWriter.add_document(doc) For the purposes of the timing comparison, I''m using an empty directory, and passing :create => true and a :field_infos hash which details certain fields which indexes but not stored, or vice versa.> it shouldn''t be slower for bulk updates.I hope I haven''t misused "bulk"> Actually, looking at your times, it seems like you may not > have the optimal settings for indexing as even 297 seconds seems > like a long time to index 35,000 documents although it depends on the > documents and where they are coming from. If you give me a little more > information I may be able to help you speed this up.Thanks Dave. I''m generating the index for rows from a SQL database and in general I''m ok with the 297 secs for 35,000 docs, but a 3x hit does hurt somewhat, particularly for larger SQL databases. The logic goes something like this: Create new ferret index Connect to SQL dbms For t in table[1..n] do Prepare sql For row in resultset do IndexWriter.add_document(row) End End Each row retrieved from the SQL dbms is a hash of up to 30 fields, and some fields are longish text [3000chars]. For a baseline, if I comment out the IndexWriter.add_document(row) then the SQL part of the process only takes around 12 secs, so most of the work is done by add_document I think. Thanks for your help, Nev
Apparently Analagous Threads
- Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit
- Warming up a new Searcher/Reader (Ferret 0.10.9 win32)
- Error with :create => true and existing index
- Help with Multiple Readers, 1 Writer scenario
- Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit