gerry.jenkins-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2010-Mar-01 17:50 UTC
Problem with regexp split
I am trying to split some text into an array seperated by one or more <br> Here is some test code: s = "one<br>two<br><br>three<br><br><br>four" p s.split(/(<br>)+/); it should split into ["one","two","three","four"] because the / (<br>)+/ pattern should use one or more <br> as the pattern to split around but it does this ["one", "<br>", "two", "<br>", "three"] Why does it do this and what split could I use to get it to work? Note:, I know that I could just fix it by removeing the <br> lines after it is done from the array, but it seems that the regular expression in split should work. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Mar 1, 2010, at 9:50 AM, gerry.jenkins-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:> I am trying to split some text into an array seperated by one or more > <br> > > Here is some test code: > > s = "one<br>two<br><br>three<br><br><br>four" > p s.split(/(<br>)+/); > > it should split into ["one","two","three","four"] because the / > (<br>)+/ pattern should use one or more <br> as the pattern to split > around > > but it does this > ["one", "<br>", "two", "<br>", "three"] > > Why does it do this and what split could I use to get it to work? > > Note:, I know that I could just fix it by removeing the <br> lines > after it is done from the array, but it seems that the regular > expression in split should work.Interesting. Docs say: If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored. If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. Which seems to be saying exactly what you are are describing. If a regexp is used the match isn''t "eaten", but simply divided on. You could split it on "<br>" and then remove any blank elements... not sure if that''s any better than your alternative approach though. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
gerry.jenkins-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2010-Mar-01 18:12 UTC
Re: Problem with regexp split
yea, I have been using reg exp and ruby for years. and this is a puzzle. On Mar 1, 10:00 am, Philip Hallstrom <phi...-LSG90OXdqQE@public.gmane.org> wrote:> On Mar 1, 2010, at 9:50 AM, gerry.jenk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote: > > > > > > > I am trying to split some text into an array seperated by one or more > > <br> > > > Here is some test code: > > > s = "one<br>two<br><br>three<br><br><br>four" > > p s.split(/(<br>)+/); > > > it should split into ["one","two","three","four"] because the / > > (<br>)+/ pattern should use one or more <br> as the pattern to split > > around > > > but it does this > > ["one", "<br>", "two", "<br>", "three"] > > > Why does it do this and what split could I use to get it to work? > > > Note:, I know that I could just fix it by removeing the <br> lines > > after it is done from the array, but it seems that the regular > > expression in split should work. > > Interesting. Docs say: > > If pattern is a String, then its contents are used as the delimiter > when splitting str. If pattern is a single space, str is split on > whitespace, with leading whitespace and runs of contiguous whitespace > characters ignored. > > If pattern is a Regexp, str is divided where the pattern matches. > Whenever the pattern matches a zero-length string, str is split into > individual characters. > > Which seems to be saying exactly what you are are describing. If a > regexp is used the match isn''t "eaten", but simply divided on. > > You could split it on "<br>" and then remove any blank elements... not > sure if that''s any better than your alternative approach though.- Hide quoted text - > > - Show quoted text --- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
gerry.jenkins-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
2010-Mar-01 18:17 UTC
Re: Problem with regexp split
Also does not behave with this code: s = "onexytwoxyxythreexyxyxyfour" p s.split(/(xy)+/) On Mar 1, 10:00 am, Philip Hallstrom <phi...-LSG90OXdqQE@public.gmane.org> wrote:> On Mar 1, 2010, at 9:50 AM, gerry.jenk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote: > > > > > > > I am trying to split some text into an array seperated by one or more > > <br> > > > Here is some test code: > > > s = "one<br>two<br><br>three<br><br><br>four" > > p s.split(/(<br>)+/); > > > it should split into ["one","two","three","four"] because the / > > (<br>)+/ pattern should use one or more <br> as the pattern to split > > around > > > but it does this > > ["one", "<br>", "two", "<br>", "three"] > > > Why does it do this and what split could I use to get it to work? > > > Note:, I know that I could just fix it by removeing the <br> lines > > after it is done from the array, but it seems that the regular > > expression in split should work. > > Interesting. Docs say: > > If pattern is a String, then its contents are used as the delimiter > when splitting str. If pattern is a single space, str is split on > whitespace, with leading whitespace and runs of contiguous whitespace > characters ignored. > > If pattern is a Regexp, str is divided where the pattern matches. > Whenever the pattern matches a zero-length string, str is split into > individual characters. > > Which seems to be saying exactly what you are are describing. If a > regexp is used the match isn''t "eaten", but simply divided on. > > You could split it on "<br>" and then remove any blank elements... not > sure if that''s any better than your alternative approach though.- Hide quoted text - > > - Show quoted text --- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
gerry.jenkins-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:> Also does not behave with this code: > > s = "onexytwoxyxythreexyxyxyfour" > p s.split(/(xy)+/)Try this: s = "one<br>two<br><br>three<br><br><br>four" array = s.split(''<br>'') array.compact.reject { |i| i.nil? or i.empty? } This will produce: [''one'', ''two'', ''three'', ''four'' ] Regards, Atc., Kirk Patrick -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
array.compact.reject { |i| i.nil? or i.empty? } seemed to leave some unwanted elements, at least on my Ruby 1.8.6. But array.delete_if { |i| i.nil? or i.empty? } worked as expected on my machine. HTH, Richard array.delete_if { |i| i.nil? or i.empty? } On Mar 1, 2:19 pm, Kirk Patrick <li...-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> gerry.jenk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote: > > Also does not behave with this code: > > > s = "onexytwoxyxythreexyxyxyfour" > > p s.split(/(xy)+/) > > Try this: > > s = "one<br>two<br><br>three<br><br><br>four" > array = s.split(''<br>'') > array.compact.reject { |i| i.nil? or i.empty? } > > This will produce: > > [''one'', ''two'', ''three'', ''four'' ] > > Regards, > > Atc., > Kirk Patrick > -- > Posted viahttp://www.ruby-forum.com/.-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Mon, Mar 1, 2010 at 9:50 AM, gerry.jenkins-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org < gerry.jenkins-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> I am trying to split some text into an array seperated by one or more > <br> > > Here is some test code: > > s = "one<br>two<br><br>three<br><br><br>four" > p s.split(/(<br>)+/); > > it should split into ["one","two","three","four"] because the / > (<br>)+/ pattern should use one or more <br> as the pattern to split > around > > but it does this > ["one", "<br>", "two", "<br>", "three"] > > Why does it do this and what split could I use to get it to work? > > Note:, I know that I could just fix it by removeing the <br> lines > after it is done from the array, but it seems that the regular > expression in split should work. > >Gerry, you can do the following: p s.gsub(/<br>/, " " ).split Good luck, -Conrad> -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To unsubscribe from this group, send email to > rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<rubyonrails-talk%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> > . > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en. > >-- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
RichardOnRails wrote:> array.compact.reject { |i| i.nil? or i.empty? } seemed to leave some > unwanted elements, at least on my Ruby 1.8.6. > > But array.delete_if { |i| i.nil? or i.empty? } worked as expected on > my machine. > > HTH, > Richard > > > array.delete_if { |i| i.nil? or i.empty? }My Ruby version is 1.8.7 But the more important is the problem solved. =P -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Mar 1, 1:00 pm, Philip Hallstrom <phi...-LSG90OXdqQE@public.gmane.org> wrote:> On Mar 1, 2010, at 9:50 AM, gerry.jenk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote: > > > > > > > I am trying to split some text into an array seperated by one or more > > <br> > > > Here is some test code: > > > s = "one<br>two<br><br>three<br><br><br>four" > > p s.split(/(<br>)+/); > > > it should split into ["one","two","three","four"] because the / > > (<br>)+/ pattern should use one or more <br> as the pattern to split > > around > > > but it does this > > ["one", "<br>", "two", "<br>", "three"] > > > Why does it do this and what split could I use to get it to work? > > > Note:, I know that I could just fix it by removeing the <br> lines > > after it is done from the array, but it seems that the regular > > expression in split should work. > > Interesting. Docs say: > > If pattern is a String, then its contents are used as the delimiter > when splitting str. If pattern is a single space, str is split on > whitespace, with leading whitespace and runs of contiguous whitespace > characters ignored. > > If pattern is a Regexp, str is divided where the pattern matches. > Whenever the pattern matches a zero-length string, str is split into > individual characters. > > Which seems to be saying exactly what you are are describing. If a > regexp is used the match isn''t "eaten", but simply divided on. > > You could split it on "<br>" and then remove any blank elements... not > sure if that''s any better than your alternative approach though.The trick here is a feature inherited from Perl - groups (in parens) in the regexp cause the delimiters to be included. This works like you''d expect: s.split(/(?:<br>)+/) the ?: modifier tells the parens to group without providing a backref. --Matt Jones -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.