Displaying 20 results from an estimated 20000 matches similar to: "Adding extra fields to an index (using RDig?)"
2006 Mar 25
1
RDig - ferret-based website crawler/indexer
Hi!
RDig is a small tool to build a Ferret index for the contents of a
website or intranet. It contains a simple HTTP crawler and some support
for extracting textual content from the fetched pages.
I built this to implement a site-wide search for a recent project
that combined a Rails application with lots of static html files
generated by a CMS.
Any feedback is very welcome!
Rubyforge
2007 Jan 23
3
Someone getting RDig work for Linux?
I got this
root at linux:~# rdig -c configfile
RDig version 0.3.4
using Ferret 0.10.14
added url file:///home/myaccount/documents/
waiting for threads to finish...
root at linux:~# rdig -c configfile -q "Ruby"
RDig version 0.3.4
using Ferret 0.10.14
executing query >Ruby<
Query:
total results: 0
root at linux:~#
my configfile
I changed from config to cfg, because of maybe
2007 Jul 29
7
RDig and AAF playing together
I have a site with two indexes. Index A is created offline by RDig
and queried from the web via RDig (specifically,
RDig.searcher.search). Index B is managed by AAF with :remote =>
true. Simple enough. However, I need to query both indexes from RDig.
Usually this is ok, as I modified RDig to accept an array of
search_paths with an element for index A and index B.
However, when Index
2006 Jul 14
2
RDig config file problem
Hi All,
Hope it is ok to post RDig queries on this forum.
Just trying to get RDig working (Ubuntu 6.06, RDig 0.3.0, ferret 0.9.4,
rubyful_soup 1.0.4)
Here is my output:
sh:~/rdigtry$ rdig -c config/rdig_config.rb
discovered content extractor class:
RDig::ContentExtractors::PdfContentExtractor
discovered content extractor class:
RDig::ContentExtractors::WordContentExtractor
discovered
2006 Jul 25
1
RDig document processing error
Hi all,
Am having problems using RDig:
With this rdig config...
cfg.crawler.start_urls = [''http://www.defensetech.org'']
cfg.crawler.include_hosts = [''www.defensetech.org'']
cfg.index.path = ''/my/path/to/index''
cfg.verbose = true
...I get this output:
$ rdig -c config/rdig_config.rb
/usr/local/lib/site_ruby/1.8/ferret/index/term.rb:45:
2007 Jan 21
4
could not install in WinXP
Directory of C:\search_app
01/21/2007 19:37 <DIR> .
01/21/2007 19:37 <DIR> ..
01/21/2007 19:36 427 008 ferret-0.10.13.gem
01/21/2007 19:07 148 992 rdig-0.3.4.gem
2 File(s) 576 000 bytes
2 Dir(s) 45 135 982 592 bytes free
C:\search_app>gem install ferret
Building native extensions. This could
2007 Jan 05
1
adding one url to rdig index?
Hey there,
I''m building a rails site using RDig as a site-wide search. I would like to be able to add just one URL (or possibly a list) to an existing index, so that when certain pages change I can update the index without reindexing the entire site. I looked through the documentation and didn''t see an example on how to do this so I am looking for some guidance here :). Is
2006 May 22
7
how to index the result of any instance method
Hi,
One of the AAF features is to be able to index results of methods, but I
haven''t seen anywhere how to do this. I have a method that returns the
full text of a file and I''d like for this to be indexed. Can anyone out
there help me out on this one?
Tom
--
Posted via http://www.ruby-forum.com/.
2006 Aug 24
2
acts_as_ferret for Ferret 0.10
Hi all,
the current acts_as_ferret trunk is now ported to Ferret 0.10.
Get it while it''s hot at
svn://projects.jkraemer.net/acts_as_ferret/trunk/plugin
Nearly everything works, besides this:
- all queries are ORed (no way to tell the QueryParser to build AND
queries by default)
- more_like_this is broken
I''m working with Dave to fix these things soon. The last Ferret 0.9.x
2006 Sep 20
5
acts_as_ferret limit on multi_search not working?
I''m using acts_as_ferret to do a query like this:
Model1.multi_search("my query",[Model2,Model3], :limit => 2)
No matter what number i set limit to I get 10 items in the resultset. Am
I doing something wrong?
Thanks/David
--
Posted via http://www.ruby-forum.com/.
2007 Jan 22
7
memcache
Just curious, is there anyway to use memcache with a ferret index?
Thanks,
Ray
--
Posted via http://www.ruby-forum.com/.
2006 Nov 20
5
Parallal Building?
I''m trying to index ~130,000 documents [soon to grow to about 500,000
documents] and I''m wondering if its possible to combine ferret databases
or in some other way split up the building process.
Normally, indexing 130k documents wouldn''t be that painful except that
there are different types of links between these documents and they are
not absolute (so for example
2006 Aug 30
7
Hyphens
Hi there,
I''m working with some legacy data where customer phone numbers are
stored with hyphens between the area code, exchange, and number (e.g.
555-555-5555). Is this the best way to store a phone number? Perhaps
not, but it''s the way they were being stored, so I have to work with
this format.
Right, so when I save a record the log tells me acts_as_ferret indexed
the
2006 Jul 07
9
Search on data accross many tables, linked by belongs_to
I am using Ferret and acts_as_ferret, as my search back-end for my Rails
project. I have a question about using acts_as_ferret on a main table
that is linked to other tables by foreign keys. Is there a way to
include the information linked by the belongs_to keyword in the search
results ?
As an example, let''s say I have a main table ''posts'':
2006 Nov 17
4
acts_as_ferret and searching word docs
I was wondering if it is possible to search word documents using ferret.
The actual text in a word document isn''t in a binary format - only the
formatting. Surely it would be possible to parse that?
--
Posted via http://www.ruby-forum.com/.
2006 Aug 20
7
missing terms in index causing search errors
I am unable to find results for models when one or more of the terms are
not being indexed.
Lets suppose I index a User on the phrase "Ruby on Rails." If I then
search using User.find_by_contents("Ruby on Rails") I get no results,
since "or" is a common term and does not get indexed. Of course,
User.find_by_contents("Ruby Rails") works just fine.
I
2006 Sep 16
4
nfs shared and ferret segfault
Hi,
I use ferret 0.10.4 whith shared index over NFS directory.
There are 2 applications servers. The web server is Mongrel 0.3.13.3
and mongrel_cluster 0.2.0. There are 20 Mongrel processes on each
server.
Each time my application update a model, Mongrel process
stops running with this errro in its log:
/usr/lib/ruby/gems/1.8/gems/ferret-0.10.4/lib/ferret/index.rb:663:
[BUG] Segmentation fault
2006 Jun 29
13
find_by_contents not returning SearchResults?
The acts_as_ferret documentation says find_by_content returns an
instance of SearchResults, but I see this error when I try to use the
results.
undefined method `total_hits'' for []:Array
Here is the link to the documentation:
http://projects.jkraemer.net/acts_as_ferret/rdoc/classes/FerretMixin/Acts/ARFerret/ClassMethods.html#M000010
But here is the actual code:
result =
2006 Apr 03
6
Installing Ferret locally on TextDrive
I would like to give the 0.9.0 version of Ferret a try on my
application hosted on TextDrive. I am currently running on the 0.3.2
version there.
Does anyone have any tips on installing it locally there? I know just
enough about Ruby gems to get by... but I am thinking it could be as
easy as passing a -i flag to specify the install location for ferret.
Then, the only thing I am not sure about
2006 Nov 01
8
aaf and stop words; query parser
I''ve been trying to implement acts_as_ferret in my latest project and ran into a snag. If I do a search for ''auditor state'' then the search works perfectly. If I include a stop word, as in ''auditor of state'', then I get no results. I''d prefer not to set stop words to nil and index everything.
The solution, that I have yet to attempt, is to use