Hi, I implemented a full text search using postgres > 8.3. My performance is very good to index (using triggers/after_filter or crontab time based index update) and to search for something. With Postgres i can index only lexemes, other solutions, like sphinx based or aaf creates the index based in length. So everything bigger than 2/3 chars is indexed. I think to use a lexeme is better to avoid falses positives in a real world. My questions are: - Any good reason to dont use tsearch and choose a aaf/*sphinx solution ? - There is any way to make tsearch index the text by length and not by lexeme (i know, probably this question i should ask in a postgres forum) Regards, Victor --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On Tue, Dec 2, 2008 at 11:43 AM, VP <pelasaco-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Hi, > > I implemented a full text search using postgres > 8.3. > My performance is very good to index (using triggers/after_filter or > crontab time based index update) and to search for something. With > Postgres i can index only lexemes, other solutions, like sphinx based > or aaf creates the index based in length. So everything bigger than > 2/3 chars is indexed. I think to use a lexeme is better to avoid > falses positives in a real world. > > My questions are: > > - Any good reason to dont use tsearch and choose a aaf/*sphinx > solution ? > - There is any way to make tsearch index the text by length and not by > lexeme (i know, probably this question i should ask in a postgres > forum)Victor, As a long-time PostgreSQL advocate... I''d encourage you to look at using Sphinx and/or UltraSphinx. Why? Configuration is way less complicated for your development and production environments. I like knowing that I have the advanced features of PostgreSQL available to me... but I''ll side with working within Ruby as much as possible on our projects. Sphinx will allow you to evolve your indexing without needing to muck around the database, which should ease pains with deployments as well. If you get to a point where Sphinx is falling over from not being to handle your load, you can evaluate using Tsearch2, but in the meantime, I''d just consider it premature optimization without much benefits int he short-term. Having said that... I haven''t worked with tsearch2 in 3+ years. ;-) Good luck! Robby -- Robby Russell Chief Evangelist, Partner PLANET ARGON, LLC design // development // hosting http://www.planetargon.com/ http://www.robbyonrails.com/ aim: planetargon +1 503 445 2457 +1 877 55 ARGON [toll free] +1 815 642 4068 [fax] --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Hey Robby, Thanks for your feedback,> > Victor, > > As a long-time PostgreSQL advocate... I''d encourage you to look at > using Sphinx and/or UltraSphinx.In fact, I used ultrasphinx, and with an average load, it wasnt uncommon to have to restart ultrasphinx daemons. Another point that, i tried to explain in my question, is the way that the data is indexed. In ultrasphinx, the document is broken in tokens and everything bigger then 2 chars is indexed. So in my opinion, it generates too much "false positives" (sorry guys, i could not find another word).. there is any way to make *sphinx index lexeme ?> Why? Configuration is way less complicated for your development and > production environments. I like knowing that I have the advanced > features of PostgreSQL available to me... but I''ll side with working > within Ruby as much as possible on our projects. Sphinx will allow you > to evolve your indexing without needing to muck around the database, > which should ease pains with deployments as well.I think when i remove an extra daemon, i''m drying my system architecture, right ? I mean, if you dont really spend some hours reading (ultra|think) sphinx code, it will be always a black box in your system that you dont understand. So am I loosing something ? maybe the learn courve to tsearch and all postgres caveats are bigger then *really* learn how *sphinx based solutions works ? Best regards, Victor --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
> In ultrasphinx, the document is broken in tokens and > everything bigger then 2 chars is indexed. So in my opinion, it > generates too much "false positives" (sorry guys, i could not find > another word).. there is any way to make *sphinx index lexeme ? >Sure, you can configure min_infix_len, min_word_len and whatever in your sphinx configuration.> I mean, if you dont really spend some hours reading (ultra|think) > sphinx code, it will be always a black box in your system that you > dont understand.Whow, so you read Postgres code before using it? ;-) Same goes for Ruby, Rails, and 99% of what tools we use. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
> Whow, so you read Postgres code before using it? ;-) > Same goes for Ruby, Rails, and 99% of what tools we use.lol.. no, i mean at least the contrib/tsearch folder i read! ok the http://www.postgresql.org/docs/8.3/static/textsearch.html as well, and too be honest its better documented then the sphinx plugins :-) Thanks for the tip about ultrasphinx, but i would like to know if its possible to do opposite, to make it work with lexemes. Like for example if you look for run, you will find run, runs, ran and running. All of these words are forms of the same lexeme: RUN I dont wanna be troll or blame the plugins, i heard a lot of good things about thinksphinx, not so good things about ultrasphinx and almost nothing about tsearch, that for me is working fine, except that some developers here comes with the questions: "if i type "te" it dont bring me my "test" post and ultrasphinx does". But well in real world users wont search for "te" and if they search for te, they will find everything like te,teahupo, tea, teta, tendency, etc, etc, etc. Regards, Victor --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
> Thanks for the tip about ultrasphinx, but i would like to know if its > possible to do opposite, to make it work with lexemes. Like for > example if you look for run, you will find run, runs, ran and running. > All of these words are forms of the same lexeme: RUNhmm, I don''t know about "ran" but if you enable star search, then ru* wil at least find run, runs, and running. This may be possible too, but I can''t say for sure, since I didn''t need this kind of search Maybe have a look at the docs: http://www.sphinxsearch.com/docs/current.html#searching> > I dont wanna be troll or blame the plugins, i heard a lot of good > things about thinksphinx, not so good things about ultrasphinx and > almost nothing about tsearch, that for me is working fine, except that > some developers here comes with the questions: "if i type "te" it dont > bring me my "test" post and ultrasphinx does". But well in real world > users wont search for "te" and if they search for te, they will find > everything like te,teahupo, tea, teta, tendency, etc, etc, etc. >lo, I agree. It was hard enough to read through lots of tutorials and documents. We started with ultrasphinx at some time back and it worked good enough for our needs. For new projects I would think about changing that to thinksphinx. Especially if I expect the index to be combined from several tables. But for basic search both should do well enough. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On 4 Dec 2008, at 13:39, VP wrote:> > >> Whow, so you read Postgres code before using it? ;-) >> Same goes for Ruby, Rails, and 99% of what tools we use. > > lol.. no, i mean at least the contrib/tsearch folder i read! ok the > http://www.postgresql.org/docs/8.3/static/textsearch.html as well, and > too be honest its better documented then the sphinx plugins :-) > > Thanks for the tip about ultrasphinx, but i would like to know if its > possible to do opposite, to make it work with lexemes. Like for > example if you look for run, you will find run, runs, ran and running. > All of these words are forms of the same lexeme: RUN > > I dont wanna be troll or blame the plugins, i heard a lot of good > things about thinksphinx, not so good things about ultrasphinx and > almost nothing about tsearch, that for me is working fine, except that > some developers here comes with the questions: "if i type "te" it dont > bring me my "test" post and ultrasphinx does". But well in real world > users wont search for "te" and if they search for te, they will find > everything like te,teahupo, tea, teta, tendency, etc, etc, etc. >sphinx can do stemming. as for searching for prefixes, that''s optional. Fred> Regards, > > Victor > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
thanks for the tips. As documentation thinksphinx has nice railscast about. But geocoding searches for example are not documented until now (or at least, until last week hehe) Regards, Victor On Dec 4, 3:02 pm, Frederick Cheung <frederick.che...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On 4 Dec 2008, at 13:39, VP wrote: > > > > > > >> Whow, so you read Postgres code before using it? ;-) > >> Same goes for Ruby, Rails, and 99% of what tools we use. > > > lol.. no, i mean at least the contrib/tsearch folder i read! ok the > >http://www.postgresql.org/docs/8.3/static/textsearch.htmlas well, and > > too be honest its better documented then the sphinx plugins :-) > > > Thanks for the tip about ultrasphinx, but i would like to know if its > > possible to do opposite, to make it work with lexemes. Like for > > example if you look for run, you will find run, runs, ran and running. > > All of these words are forms of the same lexeme: RUN > > > I dont wanna be troll or blame the plugins, i heard a lot of good > > things about thinksphinx, not so good things about ultrasphinx and > > almost nothing about tsearch, that for me is working fine, except that > > some developers here comes with the questions: "if i type "te" it dont > > bring me my "test" post and ultrasphinx does". But well in real world > > users wont search for "te" and if they search for te, they will find > > everything like te,teahupo, tea, teta, tendency, etc, etc, etc. > > sphinx can do stemming. as for searching for prefixes, that''s optional. > > Fred > > > Regards, > > > Victor > >--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---