thr3ads.net - search: "tokenstream"

Displaying 19 results from an estimated 19 matches for "tokenstream".

Did you mean: token_stream

2007 Apr 08

How to make custom TokenFilter?

In the O''reilly Ferret short cuts, I found very useful example for me. It explains how to make custom Tokenizer. But that book doesn''t explain how to make custom Filter. (especially, how to implement the #text=() method) I''m a newbee and I don''t understand how do I create my own custom Filter. Are there some good source code examples?? -- Posted via

Issues with Canoo WebTest

2005 Aug 10

Issues with Canoo WebTest

...hread.run(Thread.java:552) [canoo] Enclosed exception: [canoo] SyntaxError: illegal character (Wrapper definition for Window.setTimeout(); line 1) [canoo] at org.mozilla.javascript.NativeGlobal.constructError (NativeGlobal.java:597) [canoo] at org.mozilla.javascript.TokenStream.reportSyntaxError(TokenStream.java: 1324) [canoo] at org.mozilla.javascript.TokenStream.getToken (TokenStream.java:1302) [canoo] at org.mozilla.javascript.Parser.memberExprTail (Parser.java:1213) [canoo] at org.mozilla.javascript.Parser.memberExpr (Parser.java:1204)...

roll my own TokenFilter subclass

2007 May 18

roll my own TokenFilter subclass

Hi all, I''d like to write my own TokenStream Filter (in lucene this would be a subclass of a TokenFilter, which ferret seems to lack) but I''m not sure how to go about it. Specifically, it''s not clear how I''d create a non-trivial TokenStream to pass out to any filters that wrapped mine. Can anyone point me towards...

Dovecot Full Text Search results in SolrException: undefined field text [SERIOUS]

2015 Mar 05

Dovecot Full Text Search results in SolrException: undefined field text [SERIOUS]

...ype(IndexSchema.java:1269) at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:434) at org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:74) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:175) at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:207) at org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:374) at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:742) at org.apache.solr.pa...

Which analyzer to use

2006 Sep 06

Which analyzer to use

Lucene''s standard analyzer splits words separater with underscores. Ferret doesn''t do this. For example, if I create an index with only document ''test_case'' and search for ''case'' it doesn''t find anything. Lucene on the other hand finds it. The same story goes for words separated by colons. Which analyzer should I use to emulate

[Ferret] Serious memory leak on Joyent / TextDrive / Solaris

2007 Apr 13

[Ferret] Serious memory leak on Joyent / TextDrive / Solaris

There is serious memory leak bug in ferret. I''m having this error on TextDrive Container (aka. Joyent Accelerators) OpenSolaris with Ferret 0.11.4 It happens while searching for some terms with accented or special characters. This makes ferret to allocate lots of memory (usually reaching 3+ GB) and failing if another query like this is executed. Any ideas on that, could this be locale

Windows progress

2006 Jun 01

Windows progress

Hi there, What''s the current status of the Windows port? I may be in a position to lend a hand over the next couple of weeks - where should I start looking? And what''s the best way to get SVN HEAD? This happens: $ svn checkout svn://www.davebalmain.com/ferret/trunk ferret svn: Can''t connect to host ''www.davebalmain.com'': Connection refused --

Grep style output?

2006 Jun 13

Grep style output?

Hi All, Hope all is going well. Was just wondering if anyone has implemented a grep style output page of hits using Ferret as the index/query engine? Any thoughts about how best to implement it? The previous thread discussess highlighting - would that be the best approach to follow or is there a better way? Cheers, Marcus -- Posted via http://www.ruby-forum.com/.

StandardTokenizer Doesn''t Support token_stream method

2007 Aug 03

StandardTokenizer Doesn''t Support token_stream method

...http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html I ought to be able to construct a StandardTokenizer like this: t = StandardTokenizer.new( true) # true to downcase tokens and then later: stream = token_stream( ignored_field_name, some_string) To create a new TokenStream from some_string. This approach would be valuable for my application since I am analyzing many short strings -- so I''m thinking that building my 5-deep analyzer chain for each small string will be a nice savings. Unfortunately, StandardTokenizer#initialize does not work as advertised. It...

How to deal with accentuated chars in 0.10.8?

2006 Oct 19

How to deal with accentuated chars in 0.10.8?

I''m startin to use Ferret and acts_as_ferret. I need to use something like EuropeanAnalyzer (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars). By example, if the user search by "gonzalez" you can find documents taht contents the term "gonz?lez" (gonzález) The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter,

Festival Issues

2004 Aug 19

Festival Issues

Hey All, I now have Festival compiled, installed and running using the instructions on the Wiki page. When I try to change the voice that is being used however, I am running into a problem. I get the following in the festival server log: Cannot open file /tmp/est_10877_00000/utt.wav as tokenstream Wave load: can't open file "/tmp/est_10877_00000/utt.wav" Cannot load wavefile: /tmp/est_10877_00000/utt.wav When I look in the /tmp/est_10877_00000 folder, while the sound file is still playing according to Asterisk, the following seems to be created: total 56 drwxr-xr-x 2 darr...

Dovecot Full Text Search results in SolrException: undefined field text [SERIOUS]

2015 Mar 05

Dovecot Full Text Search results in SolrException: undefined field text [SERIOUS]

...) > at > org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:434) > at > org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:74) > at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:175) > at > org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:207) > at > org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:374) > at > org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.j...

Bug in search matching ?

2006 Oct 20

Bug in search matching ?

Hi :) Here''s a little code reproducing something that i consider as a bug, if it''s not please explain :] http://pastie.caboo.se/18693 Thanks by advance, Cheers, J?r?mie ''ahFeel'' BORDIER -- Posted via http://www.ruby-forum.com/.

Problem with fts lucene, on solaris 10

2013 Apr 05

Problem with fts lucene, on solaris 10

Hi all, I'm planning to migrate my courier-imap imap server to dovecot, but I'm experiencing a strange issue with fts-lucene plugin. Basically, every time I start a search, the log starts to write: Apr 05 19:30:53 indexer: Error: Indexer worker disconnected, discarding 1 requests for XXXXXX Apr 05 19:30:53 indexer-worker(XXXXX): Fatal: master: service(indexer-worker): child 809 killed

How to do case-sensitive searches

2006 Apr 19

How to do case-sensitive searches

Forgive me if this topic has already been discussed on the list. I googled but couldn''t find much. I''d like to search through text for US state abbreviations that are written in capitals. What is the best way to do this? I read somewhere that tokenized fields are stored in the index in lowercase, so I am concerned that I will lose precision. What is the best way to store a

Dovecot Full Text Search: HTTP 500 : Unknown fieldType 'text_general' specified on field text. [SERIOUS]

2015 Mar 05

Dovecot Full Text Search: HTTP 500 : Unknown fieldType 'text_general' specified on field text. [SERIOUS]

...>> >> >> >> >> >> >> >> org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:74) >> >> >> at >> >> >> org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:175) >> >> >> at >> >> >> >> >> >> >> >> >> org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:207) >> >> >> at >> >> >> >> >> >> >>...

Extending/Modifying QueryParser

2007 Jul 07

Extending/Modifying QueryParser

...top_words = stop_words end def token_stream(field, str) ts = StandardTokenizer.new(str) ts = LowerCaseFilter.new(ts) if @lower ts = StopFilter.new(ts, @stop_words) ts = SynonymTokenFilter.new(ts, @synonym_engine) end end class SynonymTokenFilter < Ferret::Analysis::TokenStream include Ferret::Analysis def initialize(token_stream, synonym_engine) @token_stream = token_stream @synonym_stack = [] @synonym_engine = synonym_engine end def text=(text) @token_stream.text = text end def next return @synonym_stack.pop if @synonym_stac...

Any chance to get 0.11.3 on windows soon ?

2007 Mar 23

Any chance to get 0.11.3 on windows soon ?

...', [''?'',''?'',''?''] => ''y'', [''?'',''?'',''?''] => ''z'' } class TokenFilter < TokenStream # Construct a token stream filtering the given input. def initialize(input) @input = input end end # replace accentuated chars with ASCII one class ToASCIIFilter < TokenFilter def next() token = @input.next() unless token.nil? token.text = token....

Metaphone analysis

2006 Nov 25

Metaphone analysis

...algorithm over a token stream. It''s a fairly simple class, but does require the ''Text'' gem be installed. require ''ferret'' require ''text'' module Curtis module Analysis # TODO write tests! class MetaphoneFilter < Ferret::Analysis::TokenStream def initialize(token_stream, version = :double) @input = token_stream @version = version end def next t = @input.next return nil if t.nil? t.text = @version.eql?(:double) ? Text::Metaphone.double_metaphone(t.text) : Text::Metaphone.metaphone(t.text) end en...

search for: tokenstream