Hi We have a search website where the user can type in individual words separated by spaces and/or phrases enclosed in single or double quotes. We are looking for a way to obtain a list of words and phrases from the search string. Can someone help? Thanks, Yash -- Posted via http://www.ruby-forum.com/.
Yash wrote:> Hi > > We have a search website where the user can type in individual words > separated by spaces and/or phrases enclosed in single or double quotes. > We are looking for a way to obtain a list of words and phrases from the > search string. > Can someone help?string.scan(/\w+/) to give an array of words, or string.split(/\W+/) to split on non-words. You can string.gsub(/["'']/, '''') if you want to get rid of quotes. -- Alex
If the input string is: Java Ruby ''Ruby on rails'' "software development" "technology" The list of words should be: Java Ruby Ruby on rails software development technology With your approach, the result will be: Java Ruby Ruby on rails software development technology Alex Young wrote:> Yash wrote: >> Hi >> >> We have a search website where the user can type in individual words >> separated by spaces and/or phrases enclosed in single or double quotes. >> We are looking for a way to obtain a list of words and phrases from the >> search string. >> Can someone help? > > string.scan(/\w+/) > to give an array of words, or > string.split(/\W+/) > to split on non-words. You can string.gsub(/["'']/, '''') if you want to > get rid of quotes.-- Posted via http://www.ruby-forum.com/.
Yash wrote: >>> We have a search website where the user can type in individual words >>> separated by spaces and/or phrases enclosed in single or double quotes. >>> We are looking for a way to obtain a list of words and phrases from the >>> search string.> If the input string is: > Java Ruby ''Ruby on rails'' "software development" "technology" > > The list of words should be: > Java > Ruby > Ruby on rails > software development > technology>> example = ''some text and \''some inside\'' test "double quotes"'' Using the CSV module: >> require ''csv'' >> CSV::parse_line(example, '' '') => ["some", "text", "and", "''some", "inside''", "test", "double quotes"] Fairly elegant, but doesn''t handle single quotes like you want or >> example.split( / *["''](.*?)["''] *| / ) => ["some", "text", "and", "some inside", "test", "double quotes"] Which seems to be more like you want. Hope that helps. -- R.Livsey http://livsey.org
I''m not enough of a regex guru to do it in one, so I''d
probably do it in
two:
/"([^"]+)"|(''[^'']+)''/ to grab the
quotes ... replace occurences with '''' in
original string ...
Then use the \w|\W from below to get individual tokens ... at sompoint
cleaning the original string of anything you constitute garbage.
On 4/5/06, Yash <yashgt@yahoo.com> wrote:>
> If the input string is:
> Java Ruby ''Ruby on rails'' "software
development" "technology"
>
> The list of words should be:
> Java
> Ruby
> Ruby on rails
> software development
> technology
>
> With your approach, the result will be:
> Java
> Ruby
> Ruby
> on
> rails
> software
> development
> technology
>
> Alex Young wrote:
> > Yash wrote:
> >> Hi
> >>
> >> We have a search website where the user can type in individual
words
> >> separated by spaces and/or phrases enclosed in single or double
quotes.
> >> We are looking for a way to obtain a list of words and phrases
from the
> >> search string.
> >> Can someone help?
> >
> > string.scan(/\w+/)
> > to give an array of words, or
> > string.split(/\W+/)
> > to split on non-words. You can string.gsub(/["'']/,
'''') if you want to
> > get rid of quotes.
>
>
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://wrath.rubyonrails.org/pipermail/rails/attachments/20060405/d9a6a763/attachment.html
Oops, I seem to be capturing the opening single quote ... should move that ... /"([^"]+)"|''([^'']+)''/ On 4/5/06, Hank Marquardt <hmarquardt@gmail.com> wrote:> > I''m not enough of a regex guru to do it in one, so I''d probably do it in > two: > > /"([^"]+)"|(''[^'']+)''/ to grab the quotes ... replace occurences with '''' > in original string ... > > Then use the \w|\W from below to get individual tokens ... at sompoint > cleaning the original string of anything you constitute garbage. > > > On 4/5/06, Yash <yashgt@yahoo.com> wrote: > > > > If the input string is: > > Java Ruby ''Ruby on rails'' "software development" "technology" > > > > The list of words should be: > > Java > > Ruby > > Ruby on rails > > software development > > technology > > > > With your approach, the result will be: > > Java > > Ruby > > Ruby > > on > > rails > > software > > development > > technology > > > > Alex Young wrote: > > > Yash wrote: > > >> Hi > > >> > > >> We have a search website where the user can type in individual words > > >> separated by spaces and/or phrases enclosed in single or double > > quotes. > > >> We are looking for a way to obtain a list of words and phrases from > > the > > >> search string. > > >> Can someone help? > > > > > > string.scan(/\w+/) > > > to give an array of words, or > > > string.split(/\W+/) > > > to split on non-words. You can string.gsub(/["'']/, '''') if you want to > > > get rid of quotes. > > > > > > -- > > Posted via http://www.ruby-forum.com/. > > _______________________________________________ > > Rails mailing list > > Rails@lists.rubyonrails.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060405/b93c670f/attachment.html
On 4/5/06, Yash <yashgt@yahoo.com> wrote:> If the input string is: > Java Ruby ''Ruby on rails'' "software development" "technology" > > The list of words should be: > Java > Ruby > Ruby on rails > software development > technologyirb(main):086:0> s = ''Java Ruby \''Ruby on rails\'' "software development" "technology"'' => "Java Ruby ''Ruby on rails'' \"software development\" \"technology\"" irb(main):087:0> a s.split(/(".*")|(''.*'')|((?=[^"''])\w+(?=[^"'']))/).find_all {|s| s.match(/\w+/)} => ["Java", "Ruby", "''Ruby on rails''", "\"software development\" \"technology\""] If you don''t mind having some array elements that are all whitespace you can drop the "find_all" part. -- James
On 4/5/06, James Ludlow <jamesludlow@gmail.com> wrote:> On 4/5/06, Yash <yashgt@yahoo.com> wrote: > > If the input string is: > > Java Ruby ''Ruby on rails'' "software development" "technology" > > > > The list of words should be: > > Java > > Ruby > > Ruby on rails > > software development > > technology > > irb(main):086:0> s = ''Java Ruby \''Ruby on rails\'' "software > development" "technology"'' > => "Java Ruby ''Ruby on rails'' \"software development\" \"technology\"" > > irb(main):087:0> a > s.split(/(".*")|(''.*'')|((?=[^"''])\w+(?=[^"'']))/).find_all {|s| > s.match(/\w+/)} > => ["Java", "Ruby", "''Ruby on rails''", "\"software development\" > \"technology\""] > > If you don''t mind having some array elements that are all whitespace > you can drop the "find_all" part.Of course, the second I hit "send" I realized that I pasted the wrong regex. a = s.split(/(".*?")|(''.*?'')|((?=[^"''])\w+(?=[^"'']))/).find_all {|x| x.match(/\w+/)} => ["Java", "Ruby", "''Ruby on rails''", "\"software development\"", "\"technology \""] -- James