thr3ads.net - similar to: "indexing large tokens"

Displaying 20 results from an estimated 1000 matches similar to: "indexing large tokens"

ferret-0.11.4-mswin32 not compatible with Ruby1.8.4

2007 Apr 10

ferret-0.11.4-mswin32 not compatible with Ruby1.8.4

Just a quick note for future reference - at least for me, ferret won''t work on Ruby 1.8.4. gem install ferret Successfully installed ferret-0.11.4-mswin32 ruby -v ruby 1.8.4 (2005-12-24) [i386-mswin32] irb irb(main):001:0> require ''ferret'' A windows error message box appears - ruby.exe - Entry Point Not Found The procedure entry point rb_w32_write could not be

In memory IndexReader bug?

2006 Jun 14

In memory IndexReader bug?

Hi All, Hope all is going well. I''m having trouble with the following code creating an in memory index reader - it seems to be attempting to read from a file regardless. Here''s the simple code: require ''rubygems'' require ''ferret'' a = Ferret::Index::Index.new r = Ferret::Index::IndexReader.new(nil) Running the code on my OS X machine

bug when assigning new analyzer?

2007 May 09

bug when assigning new analyzer?

require ''rubygems'' require ''ferret'' include Ferret PATH = ''/tmp/ferret_stopwords_test'' index = Index::IndexWriter.new(:path => PATH, :create => true) index.analyzer = Analysis::StandardAnalyzer.new([]) index << {:title => ''a few good men'', :language => ''en''} index.analyzer =

Double-quoted query with "and" fails.

2007 Jan 19

Double-quoted query with "and" fails.

Hi, We''re using Ferret 0.9.4 and we''ve observed the following behavior. Searching for ''fieldname: foo and bar'' works fine while ''fieldname: "foo and bar"'' doesn''t return any results. Is there a way to make ferret recognize the ''and'' inside the query as a search term and not an operator? (I hope I got the

How to do case-sensitive searches

2006 Apr 19

How to do case-sensitive searches

Forgive me if this topic has already been discussed on the list. I googled but couldn''t find much. I''d like to search through text for US state abbreviations that are written in capitals. What is the best way to do this? I read somewhere that tokenized fields are stored in the index in lowercase, so I am concerned that I will lose precision. What is the best way to store a

How to add Asia token analyzer to ferret simply?

2006 Jul 07

How to add Asia token analyzer to ferret simply?

Hi,David Can you give me an example of how to add analyzer to ferret to Asian languages? My web application will have to support multi language search,which means,for example,both Chinese and English will be searched through the form. Currently,I have decided to use the simple token principles,which means that every Chinese character will be a token,although this is not so well in some

Migrating to 0.9.1

2006 Apr 25

Migrating to 0.9.1

After migrating to 0.9.1, I got: usr/local/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in `const_missing'': uninitialized constant TokenFilter (NameError) Here is a snapshot of my code: ... require ''ferret'' class MyFilter < Analysis::TokenFilter ... I works fine on my dev machine, but not a production server (shared host). Any

How to deal with accentuated chars in 0.10.8?

2006 Oct 19

How to deal with accentuated chars in 0.10.8?

I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonzález) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,

How to make custom TokenFilter?

2007 Apr 08

How to make custom TokenFilter?

In the O''reilly Ferret short cuts, I found very useful example for me. It explains how to make custom Tokenizer. But that book doesn''t explain how to make custom Filter. (especially, how to implement the #text=() method) I''m a newbee and I don''t understand how do I create my own custom Filter. Are there some good source code examples?? -- Posted via

Index::Index.new vs. Readers and Writers

2006 May 08

Index::Index.new vs. Readers and Writers

Hey gang, A post on the Rails forum a while back had it sound like you pretty much had to use the Index Readers & Writers if you were going to be potentially accessing an index from more than one process. (i.e. multiple dispatch.fcgi''s, etc) Is this still the case, or does the main Index class do that black magic behind the scenes? =) I was having trouble implementing the

tweaking minimum word length?

2006 Jul 26

tweaking minimum word length?

Hi, Can Ferret be configured to change the minimum word length of what it indexes? Right now it seems to drop words 3 characters or less, but I''d like to include words going down to 2 characters. How would I do that? Francis

Noice words...

2007 Mar 22

Noice words...

Hi I use acts_as_ferret on an app that is in Danish and English. In Danish english words like "and" and "under" has meaning. Is it possible to make ferret search for these words? As it is now a seach for "under" returns nothing even-though I know the word is present in the index. Cheers Mattias

Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit

2006 Sep 14

Possiible Bug ? indexWriter#doc_count counts deleted docs after #commit

I''m playing with "updating" docs in my index, and I think I''ve found bug with IndexWriter counting deleted docs. Script and output follow: ===== require ''rubygems'' require ''ferret'' p Ferret::VERSION @doc = {:id => ''44'', :name => ''fred'', :email => ''abc at

Stop words, fields, StandardAnalyzer quagmire

2007 May 05

Stop words, fields, StandardAnalyzer quagmire

Hello, I''m using: Ruby 1.8.6, Rails 1.2.3, ferret 0.11.4, acts_as_ferret from svn stable. I''ve had quite a day wrestling with trying to remove the use of stopwords. The problem was that when searching for words like "no" or "the", no results were found. I found a confusing thing behavior that has taken me some time to figure out, and I hope sharing it

Some basic questions

2006 Jul 18

Some basic questions

Hi, David and everyone, I''ve had Ferret running fine in a production Rails application for a while now. I haven''t updated Ferret or really looked at the Ferret-related code since probably January, but I recently started thinking about trying out the latest version (we were using 0.3.2, I think). I got the latest (0.9.4) and have noticed things break. In particular, I used to

Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit

2006 Sep 14

Possiible Bug ? indexWriter#doc_count countsdeleted docs after #commit

Hi David, > Deleted documents don''t get deleted until commit is called Ok, but FYI, my experiments show that #commit doesn''t affect #doc_count, even across ruby sessions. On a different note, I''d like to request a variation of #add_document which returns the doc_id of the document added, as opposed to self. I''m trying to track down an issue with a large

Newb Gem Install Help!

2006 Apr 25

Newb Gem Install Help!

This seems newbish, but I can''t seem to get this gem to install no matter what I do. The gem in question is login_generator, and no matter what folder I put the .gem file in, it can''t read locally - am I missing something? Remote install hangs at updating the source. -- Posted via http://www.ruby-forum.com/.

Determine how many documents a term occurs in

2007 Apr 28

Determine how many documents a term occurs in

Is there a fast way to determine how many documents a term occurs in, besides iterating through every document with TermDocEnum? -- Best regards, Stian Gryt?yr

Ferret and non latin characters support

2007 Apr 08

Ferret and non latin characters support

I''ve successfully installed ferret and acts_as_ferret and have no problem with utf-8 for accented characters. It returns correct results fot e.g. fran?ais. My problem is with non latin characters (Persian indeed). I have tested different locales with no success both on Debian and Mac. Any idea? (ferret 0.11.4, acts_as_ferret 0.4.0, rails 1.1.6) -- Posted via http://www.ruby-forum.com/.

Short words not indexed?

2005 Dec 29

Short words not indexed?

I noticed that if I have a field that contains something like "Institute for medicine", that if I search using nay of these queries: for *for* for~ Nothing shows up. If I search for either of the other two words, though, that term would show up in the result set. Does this indicate that short words like "for" are not indexed? Thanks! Jen

similar to: indexing large tokens