thr3ads.net - Rails - Help with regular expression [Dec 2006]

If this information is useful, please help other people find it:
Share via:

ryan

2006-Dec-08 01:00 UTC

Help with regular expression

Someone helped me earlier today with a method to remove the "pre" tag
block and replace it with ''''.  This is because when
I''m showing snippets
of a post, I don''t want a piece of code to be in the snippet.  Anyway,
here''s what I''m doing:

# works perfect (only for pre)
def strip_pre_block
  text.gsub(/<pre>[^<]*<\/pre>/,'''')
end

Now, I want to do the same with blockquote.  How can I combine the two?
Something like this:

# doesn''t work at all (for either one)
def strip_pre_and_quote_blocks
 
text.gsub(/(^<pre>[^<]*<\/pre>$|^<blockquote>.*<\/blockquote>$)/,'''')
end

When I try that, it doesn''t replace either one of the "pre"
or
"blockquote" blocks with '''', but the first method
works perfect for the
"pre" block only.  I can''t even get the blockquote one to
work by
itself.

Can someone help out with this???  Thanks in advance...

-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

ryan

2006-Dec-08 01:02 UTC

head link

Re: Help with regular expression

I should note, the format of my blockquotes look like this:

<blockquote>
  <p>Quote goes here</p>
</blockquote>

I may have been ignoring the "p" tags so it didn''t work.

Still can''t get it.  Any help is greatly appreciated.


-- 
Posted via http://www.ruby-forum.com/.

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Phlip

2006-Dec-08 03:05 UTC

head link

Re: Help with regular expression

ryan wrote:
> # doesn''t work at all (for either one)
> def strip_pre_and_quote_blocks
> 
text.gsub(/(^<pre>[^<]*<\/pre>$|^<blockquote>.*<\/blockquote>$)/,'''')
> end
Here are some general suggestions. Write a test case for this regexp, have 
it print out a result, and run it over and over again as you incrementally 
add elements. I note this because you appear to have leapt directly from a 
simple to a complex regexp without incrementally examining the reaction to 
each individual symbol. Get used to writing test cases as experiments, and 
frequently running them (with one keystroke preferrably).

Next, put the ^ and $ outside the (), on principle.

Next, try (<pre>|<blockquote>).

Next, [^<]* is always tempting, but you can get less greedy IIRC with .*?. 
That will look-ahead a little more.

Next, post this to a Ruby or Regex newsgroup, because it''s not about
Rails.

Next, treat your HTML as XHTML and parse it with REXML. You can use XPath to 
reach in and nab each pre, and change its tag or contents to whatever you 
want. Then write it all back as XHTML.

Next, each < probably needs an escape, like \<
> When I try that, it doesn''t replace either one of the
"pre" or
> "blockquote" blocks with '''', but the first
method works perfect for the
> "pre" block only.  I can''t even get the blockquote one
to work by
> itself.
You could run two gsubs; one with <pre> and one with <blockquote>.

To strip, you could just yank all 4 items with 4 gsubs, too.

-- 
  Phlip
  http://www.greencheese.us/ZeekLand <-- NOT a blog!!!


--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Rob Biedenharn

2006-Dec-08 14:24 UTC

head link

Re: Help with regular expression

On Dec 7, 2006, at 10:05 PM, Phlip wrote:> ryan wrote:
>
>> # doesn''t work at all (for either one)
>> def strip_pre_and_quote_blocks
>> 
text.gsub(/(^<pre>[^<]*<\/pre>$|^<blockquote>.*<\/blockquote>$)/,'''')
>> end
>
> Here are some general suggestions. Write a test case for this  
> regexp, have
> it print out a result, and run it over and over again as you  
> incrementally
> add elements. I note this because you appear to have leapt directly  
> from a
> simple to a complex regexp without incrementally examining the  
> reaction to
> each individual symbol. Get used to writing test cases as  
> experiments, and
> frequently running them (with one keystroke preferrably).
>
> Next, put the ^ and $ outside the (), on principle.
>
> Next, try (<pre>|<blockquote>).
>
> Next, [^<]* is always tempting, but you can get less greedy IIRC  
> with .*?.
> That will look-ahead a little more.
>
> Next, post this to a Ruby or Regex newsgroup, because it''s not  
> about Rails.
>
> Next, treat your HTML as XHTML and parse it with REXML. You can use  
> XPath to
> reach in and nab each pre, and change its tag or contents to  
> whatever you
> want. Then write it all back as XHTML.
>
> Next, each < probably needs an escape, like \<
>
>> When I try that, it doesn''t replace either one of the
"pre" or
>> "blockquote" blocks with '''', but the first
method works perfect
>> for the
>> "pre" block only.  I can''t even get the blockquote
one to work by
>> itself.
>
> You could run two gsubs; one with <pre> and one with
<blockquote>.
>
> To strip, you could just yank all 4 items with 4 gsubs, too.
>
> -- 
>   Phlip
>   http://www.greencheese.us/ZeekLand <-- NOT a blog!!!
If you''re trying to get rid of just the tags, then this might work.    
However, if you intend to remove the content between the tags, too  
("The <pre>stuff doesn''t</pre> work" =>
"The  work"), then you either
need to specifically match the newlines or just use the ''m''
modifier
on your regexp to use multi-line mode.

If you want to remove <pre> and <blockquote> along with the contents
(inner_html) of those, then you can try something like this:

irb>> regexp =  %r{<(pre|blockquote)\b[^>]*>.*?</\1>}m
=> /<(pre|blockquote)\b[^>]*>.*?<\/\1>/m

irb>> html_fragment = <<EOF
And then someone said:
<blockquote>
When I try that, it doesn''t replace either one of the "pre"
or
"blockquote" blocks with '''', but the first method
works perfect for the
"pre" block only.  I can''t even get the blockquote one to
work by
itself.
</blockquote>
So I told them to use this code:
<pre class="ruby">
   regexp = %r{<(pre|blockquote)\\b[^>]*>.*?</\\1>}m
   my_string = <<-EOS
And then someone said:
<blockquote>
When I try that, it doesn''t replace either one of the "pre"
or
"blockquote" blocks with '''', but the first method
works perfect for the
"pre" block only.  I can''t even get the blockquote one to
work by
itself.
</blockquote>
   EOS
   my_string.gsub!(regexp, ''...'')
</pre>
<blockquote>
When I try that, it doesn''t replace either one of the "pre"
or
"blockquote" blocks with '''', but the first method
works perfect for the
"pre" block only.  I can''t even get the blockquote one to
work by
itself.
</blockquote>
So I told them to use this code:
<pre class="ruby">
   regexp = %r{<(pre|blockquote)\\b[^>]*>.*?</\\1>}m
   my_string = <<-EOS
And then someone said:
<blockquote>
When I try that, it doesn''t replace either one of the "pre"
or
"blockquote" blocks with '''', but the first method
works perfect for the
"pre" block only.  I can''t even get the blockquote one to
work by
itself.
</blockquote>
   EOS
   my_string.gsub!(regexp, ''...'')
</pre>
EOF
=> "And then someone said:\n<blockquote>\nWhen I try that, it
doesn''t
replace either one of the \"pre\" or\n\"blockquote\" blocks
with '''',
but the first method works perfect for the\n\"pre\" block only.  I  
can''t even get the blockquote one to work
by\nitself.\n</blockquote>
\nSo I told them to use this code:\n<pre class=\"ruby\">\n 
regexp = %
r{<(pre|blockquote)\\b[^>]*>.*?</\\1>}m\n  my_string =
<<-EOS\nAnd
then someone said:\n<blockquote>\nWhen I try that, it doesn''t
replace
either one of the \"pre\" or\n\"blockquote\" blocks with
'''', but the
first method works perfect for the\n\"pre\" block only.  I
can''t even
get the blockquote one to work by\nitself.\n</blockquote>\n  EOS\n   
my_string.gsub!(regexp, ''...'')\n</pre>\n"

irb>> html_fragment.gsub(regexp, ''...'')
=> "And then someone said:\n...\nSo I told them to use this code:\n... 
\n"

irb>> puts _
And then someone said:
...
So I told them to use this code:
...
=> nil

irb>>

In addition to using an XML parser as Philip suggests, you might want  
to actually read about regular expressions.  You can start with the  
pickaxe (Programming Ruby, The Pragmatic Programmers'' Guide, 2nd ed.)  
pages 68-70, 324-328, 600-603.

-Rob

Rob Biedenharn		http://agileconsultingllc.com
Rob-xa9cJyRlE0mWcWVYNo9pwxS2lgjeYSpx@public.gmane.org

--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Mark Thomas

2006-Dec-08 15:28 UTC

head link

Re: Help with regular expression

Ryan,

I agree with Phlip. This has crossed into territory better served by a
real HTML parser. Something like this (untested):

require ''hpricot''
doc
Hpricot("Keep<pre>Toss</pre>Keep<blockquote>Toss<p>Toss</p>Toss</blockquote>")

doc.search("//pre").each do |pre|
  pre.remove
end

doc.search("//blockquote").each do |bq|
  bq.remove
end

puts doc


--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Rails - Dec 2006 - Help with regular expression

Help with regular expression

Re: Help with regular expression

Re: Help with regular expression

Re: Help with regular expression

Re: Help with regular expression