Displaying 20 results from an estimated 20000 matches similar to: "rdig wildcard searches"
2007 Jan 23
3
Someone getting RDig work for Linux?
I got this
root at linux:~# rdig -c configfile
RDig version 0.3.4
using Ferret 0.10.14
added url file:///home/myaccount/documents/
waiting for threads to finish...
root at linux:~# rdig -c configfile -q "Ruby"
RDig version 0.3.4
using Ferret 0.10.14
executing query >Ruby<
Query:
total results: 0
root at linux:~#
my configfile
I changed from config to cfg, because of maybe
2007 Feb 15
3
Proximity searching in rdig ferret
Lucene has a syntax "foo bar"~10 for finding foo within 10 words of bar.
Does ferret support this feature? (the ~ is used for fuzzy queries) Does
rdig?
This could be a deal breaker for me ''cos I really need proximity
searches
--
Posted via http://www.ruby-forum.com/.
2006 Mar 25
1
RDig - ferret-based website crawler/indexer
Hi!
RDig is a small tool to build a Ferret index for the contents of a
website or intranet. It contains a simple HTTP crawler and some support
for extracting textual content from the fetched pages.
I built this to implement a site-wide search for a recent project
that combined a Rails application with lots of static html files
generated by a CMS.
Any feedback is very welcome!
Rubyforge
2006 Jul 25
1
RDig document processing error
Hi all,
Am having problems using RDig:
With this rdig config...
cfg.crawler.start_urls = [''http://www.defensetech.org'']
cfg.crawler.include_hosts = [''www.defensetech.org'']
cfg.index.path = ''/my/path/to/index''
cfg.verbose = true
...I get this output:
$ rdig -c config/rdig_config.rb
/usr/local/lib/site_ruby/1.8/ferret/index/term.rb:45:
2006 Jul 14
2
RDig config file problem
Hi All,
Hope it is ok to post RDig queries on this forum.
Just trying to get RDig working (Ubuntu 6.06, RDig 0.3.0, ferret 0.9.4,
rubyful_soup 1.0.4)
Here is my output:
sh:~/rdigtry$ rdig -c config/rdig_config.rb
discovered content extractor class:
RDig::ContentExtractors::PdfContentExtractor
discovered content extractor class:
RDig::ContentExtractors::WordContentExtractor
discovered
2007 Jul 29
7
RDig and AAF playing together
I have a site with two indexes. Index A is created offline by RDig
and queried from the web via RDig (specifically,
RDig.searcher.search). Index B is managed by AAF with :remote =>
true. Simple enough. However, I need to query both indexes from RDig.
Usually this is ok, as I modified RDig to accept an array of
search_paths with an element for index A and index B.
However, when Index
2007 Feb 10
5
Adding extra fields to an index (using RDig?)
Hello everyone,
I am writing an application which collects a set of web sites and caches
them locally for offline viewing. I want to do searches on this
collection and associate extra data with each result (e.g date
collected, reason for collection, perhaps a sequence number).
Now all this data exists when the harvesting is done and could be stored
in a database. I want to use RDig to index my
2007 Jun 23
2
End of File Error on index optmize
I was optimizing a 650MB using ferret (0.11.3) and I received the
following error. I''ve seen some people have similar issues but I
haven''t seen any resolutions. The contents of the index directory
follow the error. Has anyone seen anything like this and found a
resolution? Many thanks.
/mnt/apps/search/releases/20070622175637/script/../config/../vendor/
2007 Sep 27
2
Problem getting "extract" from RDig
Hi All,
I have to have a site wide search for my current application. By search
I mean I have to search the static and the dynamic contents from the
database. I have been searching on this for a while on the net and RDig
seems to be a apt solution. While using it I have encountered a few
problems. I know these might be very basic issues but I have not been
able to figure out what is wrong with
2007 Sep 18
4
basic rdig setup
I''m developing locally on Windows and I have a remote dev box that runs
Linux. I''m trying to use RDig just to index using urls, no files.
Both use acts_as_ferret for an administrative search that works fine.
On the Windows machine, I get no errors, but get no results.
On the Linux machine, I get:
File Not Found Error occured at <except.c>:93 in xraise
Error occured in
2007 Jan 21
4
could not install in WinXP
Directory of C:\search_app
01/21/2007 19:37 <DIR> .
01/21/2007 19:37 <DIR> ..
01/21/2007 19:36 427 008 ferret-0.10.13.gem
01/21/2007 19:07 148 992 rdig-0.3.4.gem
2 File(s) 576 000 bytes
2 Dir(s) 45 135 982 592 bytes free
C:\search_app>gem install ferret
Building native extensions. This could
2006 Nov 19
1
score for wildcard searches
Hello All,
I have a rails app that maintains movie data index and uses
"acts_as_ferret" for search. I ran into an issue with the scoring of
wildcard searches. When I search for word "super*", the record
containing the word "superman" is ranked above the one having just
"super".
Is this normal or am I missing something? Any ideas on how scoring can
be
2007 Jan 05
1
adding one url to rdig index?
Hey there,
I''m building a rails site using RDig as a site-wide search. I would like to be able to add just one URL (or possibly a list) to an existing index, so that when certain pages change I can update the index without reindexing the entire site. I looked through the documentation and didn''t see an example on how to do this so I am looking for some guidance here :). Is
2007 Apr 14
3
Error on optimize leads to corrupt index?
The following exception occurred while trying optimize a large index:
vendor/gems/rdig-0.3.4/lib/rdig/index.rb:46:in `optimize'': End-of-
File Error occured at <except.c>:93 in xraise (EOFError)
Error occured in store.c:216 - is_refill
current pos = 0, file length = 0
Now, I get the following error any time I try to create a new index
on the directory that I was trying
2007 Jan 05
3
Confused about Search Results
Hi everyone,
I''m pretty new to Lucene and Ferret, so I feel that this is most likely
myself not completely understanding the correct way to do this. I haved
indexed ~2200 text files (of various sizes), and I am now running
searches on the index to get a feel for Lucene and Ferret.
In my first program, which is using Lucene I search for ''influenza'' and
get the
2006 Nov 04
0
Ferret 0.10.6 released (and some benchmarks)
Hey folks,
** Description **
Firstly for those who don''t know, Ferret is a full-text search library
which makes adding search to your application a breeze. It''s much
faster than MySQL full-text search as well most other search libraries
out there. It allows you to do Boolean (+ruby + rails -jewelry) and
phrase queries ("the quick brown fox") as well as some more
2006 Aug 21
6
multiple-index searching with merged results
Hey..
i am just browsing through the lucene features and i''m wondering if this
feature is available in ferret as well ..
# multiple-index searching with merged results
this would be nice, as i''m thinking about several indexes, as i am using a
lot of wildcard queries for livesearches like google suggest. i think the
performance would increase, if i split my rather big index in
2007 Aug 05
1
IO Errors on deleting documents with Ferret
I have a large index (~6GB, ~1 million docs) that was built by RDig.
I wrote a script to iterate through the index to clear out some
duplicate information to try to reduce the size of the index.
clients.each {|client|
docs = RDig.searcher.search("+supplier_id:#{client.id}")
docs.each {|doc|
data = doc[:data].dup #the contents of the web page
new_results = {}
2007 Nov 05
6
Strange wildcard problem
Hi,
Apologies for reposting this for those who read this via ruby-forum,
but it didn''t make it to the list before, and the list seems more
active...
I''m using ferret (via acts_as_ferret) in a somewhat unorthodox
manner and am having a strange wildcard problem. Before anyone wonders
why we''re doing things this way, the answer is basically that it lets
us
2007 Oct 08
1
wildcard searches with german umlauts
i just noticed a weird problem.
i can successfully search with full terms like
"Fl?chendesinfektionsstufen" or "Regionalan?sthesie" for example and get
correct hits.
but when i search for those entries with wildcards
"Fl?chendesinfektion*" or "Regionalan?s*" it won''t find anything
while
"*chendesinfektionsstufen" or "*sthesie"