similar to: RDig document processing error

Displaying 20 results from an estimated 400 matches similar to: "RDig document processing error"

2006 Jul 14
2
RDig config file problem
Hi All, Hope it is ok to post RDig queries on this forum. Just trying to get RDig working (Ubuntu 6.06, RDig 0.3.0, ferret 0.9.4, rubyful_soup 1.0.4) Here is my output: sh:~/rdigtry$ rdig -c config/rdig_config.rb discovered content extractor class: RDig::ContentExtractors::PdfContentExtractor discovered content extractor class: RDig::ContentExtractors::WordContentExtractor discovered
2007 Jan 23
3
Someone getting RDig work for Linux?
I got this root at linux:~# rdig -c configfile RDig version 0.3.4 using Ferret 0.10.14 added url file:///home/myaccount/documents/ waiting for threads to finish... root at linux:~# rdig -c configfile -q "Ruby" RDig version 0.3.4 using Ferret 0.10.14 executing query >Ruby< Query: total results: 0 root at linux:~# my configfile I changed from config to cfg, because of maybe
2006 Mar 25
1
RDig - ferret-based website crawler/indexer
Hi! RDig is a small tool to build a Ferret index for the contents of a website or intranet. It contains a simple HTTP crawler and some support for extracting textual content from the fetched pages. I built this to implement a site-wide search for a recent project that combined a Rails application with lots of static html files generated by a CMS. Any feedback is very welcome! Rubyforge
2007 Sep 18
4
basic rdig setup
I''m developing locally on Windows and I have a remote dev box that runs Linux. I''m trying to use RDig just to index using urls, no files. Both use acts_as_ferret for an administrative search that works fine. On the Windows machine, I get no errors, but get no results. On the Linux machine, I get: File Not Found Error occured at <except.c>:93 in xraise Error occured in
2007 Sep 27
2
Problem getting "extract" from RDig
Hi All, I have to have a site wide search for my current application. By search I mean I have to search the static and the dynamic contents from the database. I have been searching on this for a while on the net and RDig seems to be a apt solution. While using it I have encountered a few problems. I know these might be very basic issues but I have not been able to figure out what is wrong with
2007 Jan 05
1
adding one url to rdig index?
Hey there, I''m building a rails site using RDig as a site-wide search. I would like to be able to add just one URL (or possibly a list) to an existing index, so that when certain pages change I can update the index without reindexing the entire site. I looked through the documentation and didn''t see an example on how to do this so I am looking for some guidance here :). Is
2007 Jul 29
7
RDig and AAF playing together
I have a site with two indexes. Index A is created offline by RDig and queried from the web via RDig (specifically, RDig.searcher.search). Index B is managed by AAF with :remote => true. Simple enough. However, I need to query both indexes from RDig. Usually this is ok, as I modified RDig to accept an array of search_paths with an element for index A and index B. However, when Index
2007 Feb 10
5
Adding extra fields to an index (using RDig?)
Hello everyone, I am writing an application which collects a set of web sites and caches them locally for offline viewing. I want to do searches on this collection and associate extra data with each result (e.g date collected, reason for collection, perhaps a sequence number). Now all this data exists when the harvesting is done and could be stored in a database. I want to use RDig to index my
2007 Jan 21
4
could not install in WinXP
Directory of C:\search_app 01/21/2007 19:37 <DIR> . 01/21/2007 19:37 <DIR> .. 01/21/2007 19:36 427 008 ferret-0.10.13.gem 01/21/2007 19:07 148 992 rdig-0.3.4.gem 2 File(s) 576 000 bytes 2 Dir(s) 45 135 982 592 bytes free C:\search_app>gem install ferret Building native extensions. This could
2007 Feb 15
3
Proximity searching in rdig ferret
Lucene has a syntax "foo bar"~10 for finding foo within 10 words of bar. Does ferret support this feature? (the ~ is used for fuzzy queries) Does rdig? This could be a deal breaker for me ''cos I really need proximity searches -- Posted via http://www.ruby-forum.com/.
2007 Jun 23
2
End of File Error on index optmize
I was optimizing a 650MB using ferret (0.11.3) and I received the following error. I''ve seen some people have similar issues but I haven''t seen any resolutions. The contents of the index directory follow the error. Has anyone seen anything like this and found a resolution? Many thanks. /mnt/apps/search/releases/20070622175637/script/../config/../vendor/
2006 Mar 29
1
Using boolean terms in PHP bindings
OK, I'm indexing my data with the scriptindex. I want to be able to restrict the search by the category field. Do I need to do anything to the data itself? Like, literally prefix it with the characters "XC"? Below is my indexor for scriptindex and the my php code... document_id : field=ref unique=Q boolean=Q search_id : field=document_id index=S document_title : field=title
2006 Sep 15
3
Crashes and tests failures again with 0.10.4
In the beginning 0.10.4 looked promising, but now that my index has grown to > 100 MB I''m getting segfaults on some searches again: >> Post.find_by_contents(''rubyforum'') # ok >> Post.find_by_contents(''ruby-forum'') /usr/local/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:351: [BUG] Segmentation fault ruby 1.8.4 (2005-12-24)
2006 Sep 22
1
QueryParser bug?
I cooked up a little script to show what I mean. This doesn''t look right to me, but maybe I just completely misunderstand QueryParser. Same output on mswin32, unix, ferret 0.9 and 0.10 Cheers, Sam require ''rubygems'' require ''ferret'' p Ferret::VERSION # 0.10.6 index = Ferret::Index::Index.new() index << {:title => "Programming
2007 Feb 15
0
rdig wildcard searches
Lucene has simple wildcard syntax supporting ? and * thus ruby could be matched by rub? r*by etc. This doesn''t work using rdig on the command line e.g. rdig -c config.rb -q ''data:"ru?y"'' gives RDig version 0.3.4 using Ferret 0.10.14 executing query >data:"ru?y"< Query: data:"ru y"~1 which is something entirely different. The
2007 Apr 14
3
Error on optimize leads to corrupt index?
The following exception occurred while trying optimize a large index: vendor/gems/rdig-0.3.4/lib/rdig/index.rb:46:in `optimize'': End-of- File Error occured at <except.c>:93 in xraise (EOFError) Error occured in store.c:216 - is_refill current pos = 0, file length = 0 Now, I get the following error any time I try to create a new index on the directory that I was trying
2007 Feb 26
4
Ferret 0.11.0 tests segfault
I have an important segfault when I create the index (via Ferret::Index::FieldInfos#create_index). I decided to run the tests, this is what I have : $> ruby test_all.rb Loading once Loaded suite test_all Started ....................EEEEEEEE./unit/../unit/index/../../unit/store/../../unit/analysis/../../unit/utils/../../unit/query_parser/../../unit/search/tc_filter.rb:11: [BUG] Segmentation
2005 Dec 19
2
Parentheses for precedence?
I''m not sure whether this is a bug or whether I''m simply expecting Ferret queries to work in a way other than they''re intended. I notice that if use a query like: (other_text:"Collaborative tools") AND NOT other_text:podcasts I''ll get correct search results. However, if I put parentheses around the second part, like:
2011 Sep 23
2
understanding stemming and synonyms
I am working with version 1.2.7 and want to use stemming and synonyms. I use the perl-bindings and get some problems. First of all: the perl-bindings dont allow the QueryParser a third argument when calling parse_query! So i cannot set a default prefix (which perhaps is the solution to my problem, but later more) i have a simple testcase: 3 documents, every document only has one word:
2006 Mar 29
1
Problems with Ferret 0.9.0
Hi, I upgraded from 0.3.2 to 0.9.0, and now my old search code doesn''t work anymore. I get a lot of ArgumentErrors, for example: "query.add_clause(Search::BooleanClause.new(query_parser.parse(term), Search::BooleanClause::Occur::MUST))" raises: ArgumentError (wrong number of arguments (2 for 0)) "index_searcher.search_each(query)" raises: ArgumentError