Erik Morton
2007-Aug-05 17:17 UTC
[Ferret-talk] IO Errors on deleting documents with Ferret
I have a large index (~6GB, ~1 million docs) that was built by RDig. I wrote a script to iterate through the index to clear out some duplicate information to try to reduce the size of the index. clients.each {|client| docs = RDig.searcher.search("+supplier_id:#{client.id}") docs.each {|doc| data = doc[:data].dup #the contents of the web page new_results = {} new_results[:client_id] = client.id new_results[:data] = data index.delete doc[:doc_id] index << new_results } } I''ve run a similar script before with no issues. However today I received the following error after 30 minutes or so: /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:726:in `initialize'': IO Error occured at <except.c>:93 in xraise (IOError) Error occured in index.c:901 - sis_find_segments_file Error reading the segment infos. Store listing was from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:726:in `ensure_reader_open'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:434:in `delete'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in `synchrolock'' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in `synchrolock'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:428:in `delete'' Despite the error the index appear to not be corrupted, so I ran the script again for fun. The following error occurred after approximately 20 minutes: /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:723:in `close'': IO Error occured at <except.c>:93 in xraise (IOError) Error occured in fs_store.c:264 - fs_new_output couldn''t create OutStream /mnt/apps/search/current/../../ shared/indexes/final/_a4kx.prx: <Too many open files> from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:723:in `ensure_reader_open'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:434:in `delete'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in `synchrolock'' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in `synchrolock'' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:428:in `delete'' Here are the contents of the index directory: [root at files]# ls -Al ../../shared/indexes/final/ total 5628324 -rw------- 1 initiate initiate 5713121647 Jul 31 14:22 _5d3s.cfs -rw------- 1 root root 115159 Aug 5 12:55 _5d3s_2yyy.del -rw------- 1 root root 22937900 Aug 5 11:28 _7tgc.cfs -rw------- 1 root root 11475 Aug 5 12:55 _7tgc_sx7.del -rw------- 1 root root 2220338 Aug 5 11:38 _820z.cfs -rw------- 1 root root 2311840 Aug 5 11:47 _8alm.cfs -rw------- 1 root root 2261887 Aug 5 11:56 _8j69.cfs -rw------- 1 root root 2089120 Aug 5 12:05 _8rqw.cfs -rw------- 1 root root 2244470 Aug 5 12:14 _90bj.cfs -rw------- 1 root root 2249160 Aug 5 12:22 _98w6.cfs -rw------- 1 root root 2231091 Aug 5 12:31 _9hgt.cfs -rw------- 1 root root 2244881 Aug 5 12:40 _9q1g.cfs -rw------- 1 root root 2273703 Aug 5 12:48 _9ym3.cfs -rw------- 1 root root 235566 Aug 5 12:49 _9zgy.cfs -rw------- 1 root root 220959 Aug 5 12:50 _a0bt.cfs -rw------- 1 root root 229074 Aug 5 12:51 _a16o.cfs -rw------- 1 root root 202310 Aug 5 12:52 _a21j.cfs -rw------- 1 root root 135823 Aug 5 12:53 _a2we.cfs -rw------- 1 root root 132935 Aug 5 12:54 _a3r9.cfs -rw------- 1 root root 14190 Aug 5 12:54 _a3uc.cfs -rw------- 1 root root 13868 Aug 5 12:54 _a3xf.cfs -rw------- 1 root root 13758 Aug 5 12:54 _a40i.cfs -rw------- 1 root root 14912 Aug 5 12:54 _a43l.cfs -rw------- 1 root root 13750 Aug 5 12:54 _a46o.cfs -rw------- 1 root root 14170 Aug 5 12:54 _a49r.cfs -rw------- 1 root root 13764 Aug 5 12:55 _a4cu.cfs -rw------- 1 root root 13719 Aug 5 12:55 _a4fx.cfs -rw------- 1 root root 13115 Aug 5 12:55 _a4j0.cfs -rw------- 1 root root 1826 Aug 5 12:55 _a4jb.cfs -rw------- 1 root root 1935 Aug 5 12:55 _a4jm.cfs -rw------- 1 root root 1739 Aug 5 12:55 _a4jx.cfs -rw------- 1 root root 1865 Aug 5 12:55 _a4k8.cfs -rw------- 1 root root 2072 Aug 5 12:55 _a4kj.cfs -rw------- 1 root root 1733 Aug 5 12:55 _a4ku.cfs -rw------- 1 root root 378 Aug 5 12:55 _a4kv.cfs -rw------- 1 root root 462 Aug 5 12:55 _a4kw.cfs -rw------- 1 root root 128 Aug 5 12:55 _a4kx.fdt -rw------- 1 root root 0 Aug 5 12:55 _a4kx.fdx -rw------- 1 root root 0 Aug 5 12:55 _a4kx.frq -rw------- 1 root root 0 Aug 5 12:55 _a4kx.tfx -rw------- 1 root root 0 Aug 5 12:55 _a4kx.tis -rw------- 1 root root 0 Aug 5 12:55 _a4kx.tix -rw------- 1 root root 0 Aug 5 12:55 ferret-write.lck -rw------- 1 initiate initiate 16 Aug 5 12:55 segments -rw------- 1 root root 1142 Aug 5 12:55 segments_isfj Here''s my platform: Linux xenU #1 SMP Thu Nov 30 13:48:50 SAST 2006 i686 athlon i386 GNU/Linux I''m using ruby 1.8.4 and Ferret 0.11.4, which has been hacked to add in better large file support. Does anyone have any idea what''s going on? Many thanks in advance. Erik
Benjamin Krause
2007-Aug-06 07:32 UTC
[Ferret-talk] IO Errors on deleting documents with Ferret
On 2007-08-05, at 7:17 PM, Erik Morton wrote:> Error occured in index.c:901 - sis_find_segments_file > Error reading the segment infos. Store listing was > > couldn''t create OutStream /mnt/apps/search/current/../../ > shared/indexes/final/_a4kx.prx: <Too many open files>Hey .. Both errors might have the same reason - to many open files .. I''ve had similar errors some month ago and raised my open files to 32k, and didn''t had an error since .. rails at omdb.org ~ $ ulimit -n 32768 Benjamin