In the O''reilly Ferret short cuts, I found very useful example for me. It explains how to make custom Tokenizer. But that book doesn''t explain how to make custom Filter. (especially, how to implement the #text=() method) I''m a newbee and I don''t understand how do I create my own custom Filter. Are there some good source code examples?? -- Posted via http://www.ruby-forum.com/.
On 4/8/07, James Kim <sjoonk at gmail.com> wrote:> In the O''reilly Ferret short cuts, I found very useful example for me. > It explains how to make custom Tokenizer. > But that book doesn''t explain how to make custom Filter. > (especially, how to implement the #text=() method) > > I''m a newbee and I don''t understand how do I create my own custom > Filter. > Are there some good source code examples??Hi James, Thanks for buying the Ferret ShortCut. I''m assuming you''re talking about implementing a custom TokenFilter and not a custom Filter (which filters search results and can be implemented itself in two different ways). Here is an example of a custom TokenFilter which reverses tokens (obviously just a toy example); class MyReverseTokenFilter < TokenStream def initialize(token_stream) @token_stream = token_stream end def text=(text) @token_stream.text = text end def next() if token = @token_stream.next token.text = token.text.reverse end token end end Notice that I did; token.text = token.text.reverse And not; token.text.reverse! You can''t change the string in place as the text and text= methods are fetching and setting a string inside a C struct. Obviously the same goes for sub!, downcase!, lstrip! etc. Let me know if you need any more help with this. Cheers, Dave -- Dave Balmain http://www.davebalmain.com/
Hi, Dave. I''m very thanks for your kind explanation and I got it. (I was very pleased when I bought your shortcut, it''s very very useful..) Anyway I have two more question. 1. Is there any difference between extending TokenStream class and just using CustomTokenFilter only without extending TokenStream? 2. What exactly the text=() method''s purpose? In Lucene, as I know, there is no method of that name. What''s the matter when I didn''t implement this method? Thank you. -- Posted via http://www.ruby-forum.com/.
On 4/10/07, James Kim <sjoonk at gmail.com> wrote:> Hi, Dave. > > I''m very thanks for your kind explanation and I got it. > (I was very pleased when I bought your shortcut, it''s very very > useful..) > > Anyway I have two more question. > > 1. Is there any difference between extending TokenStream class > and just using CustomTokenFilter only without extending TokenStream?No, currently there is no difference and therefore there is no need for you to extend TokenStream. I may extend the TokenStream class in the future making it necessary or at least advantageous to extend it but I can''t see how or why I would do this at the moment. At any rate, I''ll give plenty of warning if I do make it necessary to extend TokenStream so it is up to you if you want to.> 2. What exactly the text=() method''s purpose? > In Lucene, as I know, there is no method of that name. > What''s the matter when I didn''t implement this method?It was an unnecessary "optimization". It allows you to use a single TokenStream to tokenize multiple strings. As of Ferret 0.11.4, Ferret shouldn''t use it anymore (although it does still get used in the unit tests) so you should be able leave it out of your implementation. If you do run into problems by not implementing the text=() method then it is a bug so please let me know. -- Dave Balmain http://www.davebalmain.com/