thr3ads.net - similar to: "Error on optimize leads to corrupt index?"

Displaying 20 results from an estimated 500 matches similar to: "Error on optimize leads to corrupt index?"

2007 Jun 23

End of File Error on index optmize

I was optimizing a 650MB using ferret (0.11.3) and I received the following error. I''ve seen some people have similar issues but I haven''t seen any resolutions. The contents of the index directory follow the error. Has anyone seen anything like this and found a resolution? Many thanks. /mnt/apps/search/releases/20070622175637/script/../config/../vendor/

[0.10.0] Random error when big import

2006 Aug 24

[0.10.0] Random error when big import

In a rails script (something in the "script" dir with the good require) I added many document (around 4000) to an index globaly instanciate (and build if not present) in config/environment.rb. I ran 3 three times my script (I deleted my index every time before), and only the third was successful. That was STRANGE ! :) These are the errors :

IO Errors on deleting documents with Ferret

2007 Aug 05

IO Errors on deleting documents with Ferret

I have a large index (~6GB, ~1 million docs) that was built by RDig. I wrote a script to iterate through the index to clear out some duplicate information to try to reduce the size of the index. clients.each {|client| docs = RDig.searcher.search("+supplier_id:#{client.id}") docs.each {|doc| data = doc[:data].dup #the contents of the web page new_results = {}

[ActsAsFerret] OpenSolaris (TextDrive) indexing issues

2007 Jan 21

[ActsAsFerret] OpenSolaris (TextDrive) indexing issues

Gents, I successfully installed AAF on my TextDrive OpenSolaris Container, but I''m having some issues with indexing. I have a model called Blogs which has AAF enabled. The first time I tried to find_by_contents for a ''word'' I know was on the Database I got now results. Apparently the index was not ready yet. Then I waited a few hours and checked that the /index

Acts_as_ferret and auto-flush

2007 Mar 13

Acts_as_ferret and auto-flush

Hi, I''m using acts_as_ferret in with a mongrel and I'' m getting locking errors that after a while result in a corrupt database. I know about the problem with different processes writing to the index but I haven''t been able to get the DRB server working properly yet. I read on this list that another solution is to set :auto_flush to true but I''m not

basic rdig setup

2007 Sep 18

basic rdig setup

I''m developing locally on Windows and I have a remote dev box that runs Linux. I''m trying to use RDig just to index using urls, no files. Both use acts_as_ferret for an administrative search that works fine. On the Windows machine, I get no errors, but get no results. On the Linux machine, I get: File Not Found Error occured at <except.c>:93 in xraise Error occured in

Error : End-of-File Error occured at <except.c>

2006 Oct 17

Error : End-of-File Error occured at <except.c>

Everything was working fine till last night. This morning I have many errors. I am using acts_as_ferret. Last updated around a month ago on linux. There are two different type of exceptions. I have over 12 exception emails but these are the two distince types. First exception: A EOFError occurred in home#event_info: End-of-File Error occured at <except.c>:79 in xraise Error occured in

RDig document processing error

2006 Jul 25

RDig document processing error

Hi all, Am having problems using RDig: With this rdig config... cfg.crawler.start_urls = [''http://www.defensetech.org''] cfg.crawler.include_hosts = [''www.defensetech.org''] cfg.index.path = ''/my/path/to/index'' cfg.verbose = true ...I get this output: $ rdig -c config/rdig_config.rb /usr/local/lib/site_ruby/1.8/ferret/index/term.rb:45:

RDig and AAF playing together

2007 Jul 29

RDig and AAF playing together

I have a site with two indexes. Index A is created offline by RDig and queried from the web via RDig (specifically, RDig.searcher.search). Index B is managed by AAF with :remote => true. Simple enough. However, I need to query both indexes from RDig. Usually this is ok, as I modified RDig to accept an array of search_paths with an element for index A and index B. However, when Index

Assignments inside lapply

2011 Apr 27

Assignments inside lapply

Dear all I would like to ask you if an assignment can be done inside a lapply statement. For example I would like to covert a double nested for loop for (i in c(1:dimx)){ for (j in c(1:dimy)){ Powermap[i,j] <- Pr(c(i,j),c(PRX,PRY),f) } } to something like that: ij<-expand.grid(i=seq(1:dimx),j=(1:dimy)) unlist(lapply(1:nrow(ij),function(rowId) { return

Adding extra fields to an index (using RDig?)

2007 Feb 10

Adding extra fields to an index (using RDig?)

Hello everyone, I am writing an application which collects a set of web sites and caches them locally for offline viewing. I want to do searches on this collection and associate extra data with each result (e.g date collected, reason for collection, perhaps a sequence number). Now all this data exists when the harvesting is done and could be stored in a database. I want to use RDig to index my

Newbie problem on production server

2007 Mar 28

Newbie problem on production server

Hi, I just installed ferret for the first time and integrated it with my app. On my dev machine it''s fine but on my production server I get this when I call find_by_contents(): Processing LinksController#results (for 24.185.105.59 at 2007-03-28 05:28:36) [POST] Session ID: 3f2dc7c17147c0e52178ba697a119833 Parameters: {"commit"=>"Search",

could not install in WinXP

2007 Jan 21

could not install in WinXP

Directory of C:\search_app 01/21/2007 19:37 <DIR> . 01/21/2007 19:37 <DIR> .. 01/21/2007 19:36 427 008 ferret-0.10.13.gem 01/21/2007 19:07 148 992 rdig-0.3.4.gem 2 File(s) 576 000 bytes 2 Dir(s) 45 135 982 592 bytes free C:\search_app>gem install ferret Building native extensions. This could

Problem getting "extract" from RDig

2007 Sep 27

Problem getting "extract" from RDig

Hi All, I have to have a site wide search for my current application. By search I mean I have to search the static and the dynamic contents from the database. I have been searching on this for a while on the net and RDig seems to be a apt solution. While using it I have encountered a few problems. I know these might be very basic issues but I have not been able to figure out what is wrong with

RDig - ferret-based website crawler/indexer

2006 Mar 25

RDig - ferret-based website crawler/indexer

Hi! RDig is a small tool to build a Ferret index for the contents of a website or intranet. It contains a simple HTTP crawler and some support for extracting textual content from the fetched pages. I built this to implement a site-wide search for a recent project that combined a Rails application with lots of static html files generated by a CMS. Any feedback is very welcome! Rubyforge

adding one url to rdig index?

2007 Jan 05

adding one url to rdig index?

Hey there, I''m building a rails site using RDig as a site-wide search. I would like to be able to add just one URL (or possibly a list) to an existing index, so that when certain pages change I can update the index without reindexing the entire site. I looked through the documentation and didn''t see an example on how to do this so I am looking for some guidance here :). Is

Someone getting RDig work for Linux?

2007 Jan 23

Someone getting RDig work for Linux?

I got this root at linux:~# rdig -c configfile RDig version 0.3.4 using Ferret 0.10.14 added url file:///home/myaccount/documents/ waiting for threads to finish... root at linux:~# rdig -c configfile -q "Ruby" RDig version 0.3.4 using Ferret 0.10.14 executing query >Ruby< Query: total results: 0 root at linux:~# my configfile I changed from config to cfg, because of maybe

RDig config file problem

2006 Jul 14

RDig config file problem

Hi All, Hope it is ok to post RDig queries on this forum. Just trying to get RDig working (Ubuntu 6.06, RDig 0.3.0, ferret 0.9.4, rubyful_soup 1.0.4) Here is my output: sh:~/rdigtry$ rdig -c config/rdig_config.rb discovered content extractor class: RDig::ContentExtractors::PdfContentExtractor discovered content extractor class: RDig::ContentExtractors::WordContentExtractor discovered

Still getting "too many open files"

2007 Aug 28

Still getting "too many open files"

We have still having problems with Ferret dying on us regularly with the error message: >> ferret server error IO Error occured at <except.c>:93 in xraiseError occured in fs_store.c:127 - fs_each doing ''each'' in /var/www/web1/oms/current/script/../config/../index/production/band/20070805130005: <Too many open files> << We are running Ferret as a

Indexing fails -- _ntc6.tmp exceeds 2 gigabyte maximum

2006 Jun 02

Indexing fails -- _ntc6.tmp exceeds 2 gigabyte maximum

Ferret 0.9.3 Ruby 1.8.2 NOT storing file contents in the index. Only indexing first 25k of each file. Very large data set (1 million files, 350 Gb) Code based on snippet from David Balmain''s forum posts. After 6 hours, Ferret bails out with Ruby "exceeds max file size". Cache: -rw-r--r-- 1 bill bill 2147483647 2006-06-01 22:45 _ntc6.tmp -rw-r--r-- 1 bill bill 1690862924

similar to: Error on optimize leads to corrupt index?