Hi We have a search website where the user can type in individual words separated by spaces and/or phrases enclosed in single or double quotes. We are looking for a way to obtain a list of words and phrases from the search string. Can someone help? Thanks, Yash -- Posted via http://www.ruby-forum.com/.
Yash wrote:> Hi > > We have a search website where the user can type in individual words > separated by spaces and/or phrases enclosed in single or double quotes. > We are looking for a way to obtain a list of words and phrases from the > search string. > Can someone help?string.scan(/\w+/) to give an array of words, or string.split(/\W+/) to split on non-words. You can string.gsub(/["'']/, '''') if you want to get rid of quotes. -- Alex
If the input string is: Java Ruby ''Ruby on rails'' "software development" "technology" The list of words should be: Java Ruby Ruby on rails software development technology With your approach, the result will be: Java Ruby Ruby on rails software development technology Alex Young wrote:> Yash wrote: >> Hi >> >> We have a search website where the user can type in individual words >> separated by spaces and/or phrases enclosed in single or double quotes. >> We are looking for a way to obtain a list of words and phrases from the >> search string. >> Can someone help? > > string.scan(/\w+/) > to give an array of words, or > string.split(/\W+/) > to split on non-words. You can string.gsub(/["'']/, '''') if you want to > get rid of quotes.-- Posted via http://www.ruby-forum.com/.
Yash wrote: >>> We have a search website where the user can type in individual words >>> separated by spaces and/or phrases enclosed in single or double quotes. >>> We are looking for a way to obtain a list of words and phrases from the >>> search string.> If the input string is: > Java Ruby ''Ruby on rails'' "software development" "technology" > > The list of words should be: > Java > Ruby > Ruby on rails > software development > technology>> example = ''some text and \''some inside\'' test "double quotes"'' Using the CSV module: >> require ''csv'' >> CSV::parse_line(example, '' '') => ["some", "text", "and", "''some", "inside''", "test", "double quotes"] Fairly elegant, but doesn''t handle single quotes like you want or >> example.split( / *["''](.*?)["''] *| / ) => ["some", "text", "and", "some inside", "test", "double quotes"] Which seems to be more like you want. Hope that helps. -- R.Livsey http://livsey.org
I''m not enough of a regex guru to do it in one, so I''d probably do it in two: /"([^"]+)"|(''[^'']+)''/ to grab the quotes ... replace occurences with '''' in original string ... Then use the \w|\W from below to get individual tokens ... at sompoint cleaning the original string of anything you constitute garbage. On 4/5/06, Yash <yashgt@yahoo.com> wrote:> > If the input string is: > Java Ruby ''Ruby on rails'' "software development" "technology" > > The list of words should be: > Java > Ruby > Ruby on rails > software development > technology > > With your approach, the result will be: > Java > Ruby > Ruby > on > rails > software > development > technology > > Alex Young wrote: > > Yash wrote: > >> Hi > >> > >> We have a search website where the user can type in individual words > >> separated by spaces and/or phrases enclosed in single or double quotes. > >> We are looking for a way to obtain a list of words and phrases from the > >> search string. > >> Can someone help? > > > > string.scan(/\w+/) > > to give an array of words, or > > string.split(/\W+/) > > to split on non-words. You can string.gsub(/["'']/, '''') if you want to > > get rid of quotes. > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060405/d9a6a763/attachment.html
Oops, I seem to be capturing the opening single quote ... should move that ... /"([^"]+)"|''([^'']+)''/ On 4/5/06, Hank Marquardt <hmarquardt@gmail.com> wrote:> > I''m not enough of a regex guru to do it in one, so I''d probably do it in > two: > > /"([^"]+)"|(''[^'']+)''/ to grab the quotes ... replace occurences with '''' > in original string ... > > Then use the \w|\W from below to get individual tokens ... at sompoint > cleaning the original string of anything you constitute garbage. > > > On 4/5/06, Yash <yashgt@yahoo.com> wrote: > > > > If the input string is: > > Java Ruby ''Ruby on rails'' "software development" "technology" > > > > The list of words should be: > > Java > > Ruby > > Ruby on rails > > software development > > technology > > > > With your approach, the result will be: > > Java > > Ruby > > Ruby > > on > > rails > > software > > development > > technology > > > > Alex Young wrote: > > > Yash wrote: > > >> Hi > > >> > > >> We have a search website where the user can type in individual words > > >> separated by spaces and/or phrases enclosed in single or double > > quotes. > > >> We are looking for a way to obtain a list of words and phrases from > > the > > >> search string. > > >> Can someone help? > > > > > > string.scan(/\w+/) > > > to give an array of words, or > > > string.split(/\W+/) > > > to split on non-words. You can string.gsub(/["'']/, '''') if you want to > > > get rid of quotes. > > > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Rails mailing list > > Rails@lists.rubyonrails.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060405/b93c670f/attachment.html
On 4/5/06, Yash <yashgt@yahoo.com> wrote:> If the input string is: > Java Ruby ''Ruby on rails'' "software development" "technology" > > The list of words should be: > Java > Ruby > Ruby on rails > software development > technologyirb(main):086:0> s = ''Java Ruby \''Ruby on rails\'' "software development" "technology"'' => "Java Ruby ''Ruby on rails'' \"software development\" \"technology\"" irb(main):087:0> a s.split(/(".*")|(''.*'')|((?=[^"''])\w+(?=[^"'']))/).find_all {|s| s.match(/\w+/)} => ["Java", "Ruby", "''Ruby on rails''", "\"software development\" \"technology\""] If you don''t mind having some array elements that are all whitespace you can drop the "find_all" part. -- James
On 4/5/06, James Ludlow <jamesludlow@gmail.com> wrote:> On 4/5/06, Yash <yashgt@yahoo.com> wrote: > > If the input string is: > > Java Ruby ''Ruby on rails'' "software development" "technology" > > > > The list of words should be: > > Java > > Ruby > > Ruby on rails > > software development > > technology > > irb(main):086:0> s = ''Java Ruby \''Ruby on rails\'' "software > development" "technology"'' > => "Java Ruby ''Ruby on rails'' \"software development\" \"technology\"" > > irb(main):087:0> a > s.split(/(".*")|(''.*'')|((?=[^"''])\w+(?=[^"'']))/).find_all {|s| > s.match(/\w+/)} > => ["Java", "Ruby", "''Ruby on rails''", "\"software development\" > \"technology\""] > > If you don''t mind having some array elements that are all whitespace > you can drop the "find_all" part.Of course, the second I hit "send" I realized that I pasted the wrong regex. a = s.split(/(".*?")|(''.*?'')|((?=[^"''])\w+(?=[^"'']))/).find_all {|x| x.match(/\w+/)} => ["Java", "Ruby", "''Ruby on rails''", "\"software development\"", "\"technology \""] -- James