hi - i have strings that i need to extract keywords from. the string might have html tags, urls, etc. i need to extract the keywords from the string. i imagine i''m not the first guy to have to tackle this problem. is there a gem i can use or anyone have any ideas how to approach this? thanks, dino
Quoting dino d. <dinodorroco-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>:> > hi - > > i have strings that i need to extract keywords from. the string might > have html tags, urls, etc. i need to extract the keywords from the > string. i imagine i''m not the first guy to have to tackle this > problem. is there a gem i can use or anyone have any ideas how to > approach this? >More detail needed about the keywords. The simple case is keywords regardless of context, separated by whitespace. KEYWORDS = %{if else then end case when do def} str = "if true then false else true end" str.split.find_all{|s| KEYWORDS.include?(s)} irb(main):006:0> KEYWORDS = %{if else then end case when do def} => "if else then end case when do def" irb(main):007:0> str = "if true then false else true end" => "if true then false else true end" irb(main):008:0> str.split.find_all{|s| KEYWORDS.include?(s)} => ["if", "then", "else", "end"] irb(main):009:0> If you need to exclude keywords inside strings, URLs, etc. the solution is more complex. HTH, Jeffrey
Jeff- thanks for the reply. i can deal with context in a different method, in your solution, i still grab "<a>" and "test." and "&wow*&&" as keywords. i want to send this method a string, and get an array of letter-only words returned. if you have context ideas, i''d love to hear those too, but the first step is just harvesting only character words from strings. thanks, dino On Aug 23, 9:44 pm, "Jeffrey L. Taylor" <r...-f/t7CGFWhwGcvWdFBKKxig@public.gmane.org> wrote:> Quoting dino d. <dinodorr...-/E1597aS9LQAvxtiuMwx3w@public.gmane.org>: > > > > > hi - > > > i have strings that i need to extract keywords from. the string might > > have html tags, urls, etc. i need to extract the keywords from the > > string. i imagine i''m not the first guy to have to tackle this > > problem. is there a gem i can use or anyone have any ideas how to > > approach this? > > More detail needed about the keywords. The simple case is keywords regardless > of context, separated by whitespace. > > KEYWORDS = %{if else then end case when do def} > > str = "if true then false else true end" > str.split.find_all{|s| KEYWORDS.include?(s)} > > irb(main):006:0> KEYWORDS = %{if else then end case when do def} > => "if else then end case when do def" > irb(main):007:0> str = "if true then false else true end" > => "if true then false else true end" > irb(main):008:0> str.split.find_all{|s| KEYWORDS.include?(s)} > => ["if", "then", "else", "end"] > irb(main):009:0> > > If you need to exclude keywords inside strings, URLs, etc. the solution is > more complex. > > HTH, > Jeffrey
If you are doing html parsing, you''ll want to look into hpricot. http://juixe.com/techknow/index.php/2008/05/19/using-hpricot/ There are a few parsers out there, I''ve written a couple myself. -- Posted via http://www.ruby-forum.com/.
On Aug 23, 2009, at 7:44 PM, Alpha Blue wrote:> > If you are doing html parsing, you''ll want to look into hpricot. > > http://juixe.com/techknow/index.php/2008/05/19/using-hpricot/ > > There are a few parsers out there, I''ve written a couple myself. > --Many people are leaning toward Nokogiri (read: http://nokogiri.rubyforge.org/nokogiri/Nokogiri.html) .
On Aug 24, 12:48 am, "s.ross" <cwdi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> On Aug 23, 2009, at 7:44 PM, Alpha Blue wrote: > > > > > If you are doing html parsing, you''ll want to look into hpricot. > > >http://juixe.com/techknow/index.php/2008/05/19/using-hpricot/ > > > There are a few parsers out there, I''ve written a couple myself. > > -- > > Many people are leaning toward Nokogiri (read:http://nokogiri.rubyforge.org/nokogiri/Nokogiri.html) > .Agreed. With the disappearance of _why, the future of hpricot is uncertain.