I'm finding that omindex is consistently ending prematurely when indexing certain files. The last output looks like this: [Entering directory /compounds/Acetic_acid] Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.TXT" as text/plain ... added. Indexing "/MATLAB/compounds/Acetic_acid/AACID_50T.pdf" as application/pdf ... "pdftotext -enc UTF-8 /home/users/MATLAB/compounds/Acetic_acid/AACID_50T.pdf -" failed - skipping Indexing "/MATLAB/compounds/Acetic_acid/AACID_25T.TXT" as text/plain ... Killed Any ideas what might cause this? -- Chris Purves Visit my blog: http://chris.northfolk.ca "Justice doesn't give you grandchildren." - Ben Stone
> KilledThat's usually running out of resources. What does "ulimit" say? If this is your box, you can set higher limits. If it is not your box, you'll need to partition your indexing. Rob
On 29/12/2012 2:43 PM, Jim Lynch wrote:> On 12/29/2012 11:35 AM, Chris Purves wrote: >> I have 2 GB of RAM and 1 GB of swap. The RAM is mostly accessible >> when I begin (1.7 GB). The problem directory contains many about 400 >> sub-directories with very large text files ~ 2.5 MB. >> > I just realized that the default (I think) is 10000, so try something > smaller. Maybe start at 1000 and work up. > export XAPIAN_FLUSH_THRESHOLD=1000I have to set it to a ridiculously low value XAPIAN_FLUSH_THRESHOLD=30; however, once it has completed I can run again at the default value and the already indexed files are skipped, so that should work alright. -- Chris Purves Visit my blog: http://chris.northfolk.ca "The idea is to zap them with lasers and see how they respond." - Dr. Scott Menary
On 29/12/2012 7:22 PM, Chris Purves wrote:> On 29/12/2012 2:43 PM, Jim Lynch wrote: >> On 12/29/2012 11:35 AM, Chris Purves wrote: >>> I have 2 GB of RAM and 1 GB of swap. The RAM is mostly accessible >>> when I begin (1.7 GB). The problem directory contains many about 400 >>> sub-directories with very large text files ~ 2.5 MB. >>> >> I just realized that the default (I think) is 10000, so try something >> smaller. Maybe start at 1000 and work up. >> export XAPIAN_FLUSH_THRESHOLD=1000 > > I have to set it to a ridiculously low value XAPIAN_FLUSH_THRESHOLD=30; > however, once it has completed I can run again at the default value and > the already indexed files are skipped, so that should work alright. > >Thanks everyone for your help. I was able to run a full, albeit slow, initial index with a low flush threshold. Subsequent indexing is fast without having to change the default value. -- Chris Purves Visit my blog: http://chris.northfolk.ca "Humans in space suits make monkeys nervous." - Richard Preston