Displaying 20 results from an estimated 6000 matches similar to: "adding one url to rdig index?"
2006 Jul 25
1
RDig document processing error
Hi all,
Am having problems using RDig:
With this rdig config...
cfg.crawler.start_urls = [''http://www.defensetech.org'']
cfg.crawler.include_hosts = [''www.defensetech.org'']
cfg.index.path = ''/my/path/to/index''
cfg.verbose = true
...I get this output:
$ rdig -c config/rdig_config.rb
/usr/local/lib/site_ruby/1.8/ferret/index/term.rb:45:
2006 Mar 25
1
RDig - ferret-based website crawler/indexer
Hi!
RDig is a small tool to build a Ferret index for the contents of a
website or intranet. It contains a simple HTTP crawler and some support
for extracting textual content from the fetched pages.
I built this to implement a site-wide search for a recent project
that combined a Rails application with lots of static html files
generated by a CMS.
Any feedback is very welcome!
Rubyforge
2006 Jul 14
2
RDig config file problem
Hi All,
Hope it is ok to post RDig queries on this forum.
Just trying to get RDig working (Ubuntu 6.06, RDig 0.3.0, ferret 0.9.4,
rubyful_soup 1.0.4)
Here is my output:
sh:~/rdigtry$ rdig -c config/rdig_config.rb
discovered content extractor class:
RDig::ContentExtractors::PdfContentExtractor
discovered content extractor class:
RDig::ContentExtractors::WordContentExtractor
discovered
2007 Jan 23
3
Someone getting RDig work for Linux?
I got this
root at linux:~# rdig -c configfile
RDig version 0.3.4
using Ferret 0.10.14
added url file:///home/myaccount/documents/
waiting for threads to finish...
root at linux:~# rdig -c configfile -q "Ruby"
RDig version 0.3.4
using Ferret 0.10.14
executing query >Ruby<
Query:
total results: 0
root at linux:~#
my configfile
I changed from config to cfg, because of maybe
2007 Jul 29
7
RDig and AAF playing together
I have a site with two indexes. Index A is created offline by RDig
and queried from the web via RDig (specifically,
RDig.searcher.search). Index B is managed by AAF with :remote =>
true. Simple enough. However, I need to query both indexes from RDig.
Usually this is ok, as I modified RDig to accept an array of
search_paths with an element for index A and index B.
However, when Index
2007 Jan 21
4
could not install in WinXP
Directory of C:\search_app
01/21/2007 19:37 <DIR> .
01/21/2007 19:37 <DIR> ..
01/21/2007 19:36 427 008 ferret-0.10.13.gem
01/21/2007 19:07 148 992 rdig-0.3.4.gem
2 File(s) 576 000 bytes
2 Dir(s) 45 135 982 592 bytes free
C:\search_app>gem install ferret
Building native extensions. This could
2007 Feb 10
5
Adding extra fields to an index (using RDig?)
Hello everyone,
I am writing an application which collects a set of web sites and caches
them locally for offline viewing. I want to do searches on this
collection and associate extra data with each result (e.g date
collected, reason for collection, perhaps a sequence number).
Now all this data exists when the harvesting is done and could be stored
in a database. I want to use RDig to index my
2007 Sep 18
4
basic rdig setup
I''m developing locally on Windows and I have a remote dev box that runs
Linux. I''m trying to use RDig just to index using urls, no files.
Both use acts_as_ferret for an administrative search that works fine.
On the Windows machine, I get no errors, but get no results.
On the Linux machine, I get:
File Not Found Error occured at <except.c>:93 in xraise
Error occured in
2007 Sep 27
2
Problem getting "extract" from RDig
Hi All,
I have to have a site wide search for my current application. By search
I mean I have to search the static and the dynamic contents from the
database. I have been searching on this for a while on the net and RDig
seems to be a apt solution. While using it I have encountered a few
problems. I know these might be very basic issues but I have not been
able to figure out what is wrong with
2007 Jun 23
2
End of File Error on index optmize
I was optimizing a 650MB using ferret (0.11.3) and I received the
following error. I''ve seen some people have similar issues but I
haven''t seen any resolutions. The contents of the index directory
follow the error. Has anyone seen anything like this and found a
resolution? Many thanks.
/mnt/apps/search/releases/20070622175637/script/../config/../vendor/
2006 May 22
7
how to index the result of any instance method
Hi,
One of the AAF features is to be able to index results of methods, but I
haven''t seen anywhere how to do this. I have a method that returns the
full text of a file and I''d like for this to be indexed. Can anyone out
there help me out on this one?
Tom
--
Posted via http://www.ruby-forum.com/.
2007 Jun 24
1
Example for using ferret search engine
Hi,
Is there any application where I can see the usage of Ferret engine(like
example implementation). I have some difficulties in using it, sending
query and getting the results.
Thank you,
Raj.
--
Posted via http://www.ruby-forum.com/.
2006 Nov 17
4
acts_as_ferret and searching word docs
I was wondering if it is possible to search word documents using ferret.
The actual text in a word document isn''t in a binary format - only the
formatting. Surely it would be possible to parse that?
--
Posted via http://www.ruby-forum.com/.
2006 Dec 15
1
acts_as_ferret: reindexing it too slow
Hi,
Recently, I was trying to play around with AAF and found that reindexing
table is very slow. Then I started looking into Ferret performance and
tried myself and found that it''s very fast. Then, I just used Ferret to
index my table and it was also very fast. All good.
Then why reindexing using AAF is slow. After sometime I found that in
the AAF, it uses (:key => :id) in
2007 Feb 15
3
Proximity searching in rdig ferret
Lucene has a syntax "foo bar"~10 for finding foo within 10 words of bar.
Does ferret support this feature? (the ~ is used for fuzzy queries) Does
rdig?
This could be a deal breaker for me ''cos I really need proximity
searches
--
Posted via http://www.ruby-forum.com/.
2007 Jun 07
5
Advise on slowness in bootstrapping?
I am looking at trying to use ferret/aaf to supplement my querying against a
medium and large table with lots of columns. Some facts first:
Ferret 0.11.4
AAF 0.4.0
Ruby 1.8.6
Rails 1.2.3
Medium table:
105,464 rows
168 columns (mostly varchar(20))
11 actual columns indexed in aaf plus
40 virtual columns indexed in aaf (virtual is concat of two physical columns.
e.g. cast_first_name_1 +
2006 Aug 25
7
disabling automatic indexing in acts_as_ferret
I''d like to be able to enable/disable the automatic indexing of
documents acts_as_ferret does. Something like MyModel.disable_indexing
MyModel.enable_indexing would be perfect. I need this because I do some
indexing that requires visiting the parents of the model objects and my
import method imports the children first, so the information isn''t there
yet. I''d like to
2007 Jun 12
5
index browser inconsistent with IndexReader
Hi,
We have an index of around 1M web pages as part of our web app. The
app uses ferret by way of RDig to perform searches. We have noticed
anecdotally that some searches don''t work the way we thought they
should, as if documents were missing from the index. Yesterday we
came upon a concrete instance of this.
Our documents have several fields, one of which is called :keywords
and
2007 Apr 06
3
Double work at Model.rebuild_index
I''m noting that every time I run Model.rebuild_index its running twice
the rebuild_index. Also, on ferret_index.log there is only one small
difference from the first and second time, see:
First time it shows:
rebuild index: []
reindexing model User
After it finishes, it automatically starts the second time and shows;
rebuild index: [["User"]]
reindexing model User
The full
2007 Feb 15
0
rdig wildcard searches
Lucene has simple wildcard syntax supporting ? and * thus ruby could be
matched by rub? r*by etc.
This doesn''t work using rdig on the command line
e.g. rdig -c config.rb -q ''data:"ru?y"'' gives
RDig version 0.3.4
using Ferret 0.10.14
executing query >data:"ru?y"<
Query: data:"ru y"~1
which is something entirely different. The