Consider the string xyz<h1>x P0 y</h1><h1>x Q1 y</h1><h1>1 Placeholder2 2</h1>abc and the pattern <(h\d)>.*?Placeholder2.*?<\/\1> the pattern matches <h1>x P0 y</h1><h1>x Q1 y</h1><h1>1 Placeholder2 2</h1> I want it to match <h1>1 Placeholder2 2</h1> How can I do this? That is, I want to find the nearest <h1> ... </h1> surrounding Placeholder2. -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Feb 8, 3:46 pm, Ralph Shnelvar <li...-fsXkhYbjdPsEEoCn2XhGlw@public.gmane.org> wrote:> Consider the string > > xyz<h1>x P0 y</h1><h1>x Q1 y</h1><h1>1 Placeholder2 2</h1>abc > > and the pattern > > <(h\d)>.*?Placeholder2.*?<\/\1> > > the pattern matches > <h1>x P0 y</h1><h1>x Q1 y</h1><h1>1 Placeholder2 2</h1> > > I want it to match > > <h1>1 Placeholder2 2</h1> > > How can I do this? That is, I want to find the nearest <h1> ... </h1> > surrounding Placeholder2. >Don''t know if/how to bend ruby''s regular expressions to do this but trying to parse html with regular expressions is doomed to fail eventually. Use something like nokogiri. Fred -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Ralph Shnelvar wrote in post #980334:> How can I do this? That is, I want to find the nearest <h1> ... </h1> > surrounding Placeholder2.First, I''ll +1 Fred on using Nokogiri for parsing HTML. But you can modify you regex so any markup ''<'' characters are excluded using [^<], as in:>> p = /<(h\d)>[^<]+Placeholder2.*?<\/\1>/ >> s = "xyz<h1>x P0 y</h1><h1>x Q1 y</h1><h1>1 Placeholder2 2</h1>abc" >> p =~ s=> 33>> $1=> "h1" Is that what you meant? - ff -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
In terms of Nokogiri and Hpricot ... I develop on a Windows machine and my ISP''s machine is a Linux. Nokoogiri works great on my development machine. My ISP does not support Nokogiri on his ... unless I am willing to spend the money to have him install it .. which I don''t ... and for political reasons, I can''t move to another ISP. Hpricot has given me lots and lots of problems ... So I have been reduced to parsing some html myself. I don''t want to do it ... but I gotta. Fearless, your solution seems to work ... but I am clueless as to how and why it works! -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Ralph Shnelvar wrote in post #980369:> Fearless, your solution seems to work ... but I am clueless as to how > and why it works!I''m FAR from a regex wizard, but it''s worth noting: [abc] means match any occurrence of a or b or c [^abc] means match any character that is NOT a or b or c ergo [^<] means match anything that is NOT an open bracket [^<]+ means match one or more things are are not open brackets so /<(h\d)>[^<]+Placeholder2.*?<\/\1>/ matches an open < followed by an h followed by a digit followed by a close >, then any number of characters as long as they are NOT < followed by "Placeholder2" ... etc Of course, this will break as soon as someone adds attributes to the <h1> tag, such as <h1 class="navbar">, which is why we all like Nokogiri. I''m sorry your ISP doesn''t agree! :) -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.