Hi Im working through the "Best of ruby quiz" book which some of you might be familiar with, but hey dont worry if not, you can probably still help me :-) - I''ve found a regular expression that does what I want, but not quite sure why it works. Given: story = "The ((velocity)) ((colour)) ((wildbeast)) ((action)) over the ((adjective)) ((domesticbeast))" I want to parse this into an array such that each element of the array is the string split on the "((blabla))" bits. This does that: irb(main):052:0> story.split /\(\(.*?\)\)/ => ["The ", " ", " ", " ", " over the ", " "] However I also want the sections marked "((blabla))" included as well... I fiddled a bit and got this, which works: irb(main):053:0> story.split /(\(\(.*?\)\))+/ => ["The ", "((velocity))", " ", "((colour))", " ", "((wildbeast))", " ", "((action))", " over the ", "((adjective))", " ", "((domesticbeast))"] However Im not exactly sure what makes this work - can anyone illuminate this for me? glenn --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
Glenn, this tool may help you understand whats happening. http://weitz.de/regex-coach/ http://www.regular-expressions.info/reference.html has some good explanations Its great because you can type in a target string and play around with the expression and see if it works. You''ll need to remove the / at the beginning and end of the expression though. There are a couple tabs in there too where you can see the decision tree, how it will split, etc. Since I don''t know how well you understand regex''s forgive me if this is stating some of the obvious. In your regular expression the \( is an escape character for the ( symbol. There are a couple things that require escapes, either because they are special symbols, or things like \w which is a predefined pattern [\w = letters and numbers only]> story = "The ((velocity)) ((colour)) ((wildbeast)) ((action)) over the > ((adjective)) ((domesticbeast))" > irb(main):052:0> story.split /\(\(.*?\)\)/ > irb(main):053:0> story.split /(\(\(.*?\)\))+/I''m not a master of regex''s but I''ll take a stab at this one. you are asking the story varable to split the results into an array anytime the pattern is matched. so your first pattern is looking for ((X)) where X is another pattern. X or .*? is the dot [any non-line break character] 0 to infinity # of times. the ? makes it optional and greedy which I believe means to find the first / smallest possible match (so the .* doesn''t keep going until the last )) it encounters) I''m a little less certain here so others may correct me or fill in the bits I miss. In the second pattern, the main difference is that the first pattern is surrounded by parens, with a +. By putting parens around your pattern you are grouping it. Generally it is used so another operation can be preformed. An example pattern might be /sh(op|irt)/ where either shop or shirt will match, but shower will not. In your case I believe it groups the ((X)) pattern so it can be matched multiple times. -- Posted via http://www.ruby-forum.com/. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---
On May 8, 9:01 am, glenn <g...-H7U4i/pZ250d9SLi6J12ItHuzzzSOjJt@public.gmane.org> wrote:> Given: > > story = "The ((velocity)) ((colour)) ((wildbeast)) ((action)) over the > ((adjective)) ((domesticbeast))" > > I want to parse this into an array such that each element of the array > is the string split on the "((blabla))" bits. > This does that: > > irb(main):052:0> story.split /\(\(.*?\)\)/ > => ["The ", " ", " ", " ", " over the ", " "] > > However I also want the sections marked "((blabla))" included as > well... I fiddled a bit and got this, which works: > irb(main):053:0> story.split /(\(\(.*?\)\))+/ > => ["The ", "((velocity))", " ", "((colour))", " ", "((wildbeast))", " > ", "((action))", " over the ", "((adjective))", " ", > "((domesticbeast))"] > > However Im not exactly sure what makes this work - can anyone > illuminate this for me?String#split will normally take a pattern representing a delimiter, and split the string into parts that are separated by the delimiter, returning the parts. However, if you enclose the pattern in capturing parens, then split returns both the parts *and* the delimiters. So: >> "foo-bar-baz".split(/-/) => ["foo", "bar", "baz"] >> "foo-bar-baz".split(/(-)/) => ["foo", "-", "bar", "-", "baz"] Your pattern is encosed in parens, so it will get returned along with the parts between the pattern. The pattern is: (\(\(.*?\)\))+ Working from the inside: \(\( two literal left parens, followed by .*? match shortest sequence of any char except \n, followed by \)\) two literal right parens This is wrapped in (), which are capturing parens (since they aren''t escaped with a backslash) The pattern is followed by a +, which means "occurring one or more times". You may not want this, because it would treat "((foo))((bar))" as a single delimiter. Now when you split on this, you get all the "((sometext))" elements, together with the stuff in between them. If you just want to capture the "((sometext))" words, you should look at String#scan --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---