thr3ads.net - Rails - [Rails] html special characters. h() failure. [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Neil Dugan

2006-Jan-27 08:20 UTC

[Rails] html special characters. h() failure.

I was trying to convert a some text with the (r) character it so it
replaced character \xAE with &reg;

h(@item.description) didn''t do anything.  I need to use
@item.description.grep(/\xAE/,''&reg;'') for it to work.

I think the h() function should be able to do all the codes that are
available.

Regards Neil.

Francois Paul

2006-Jan-27 09:12 UTC

head link

[Rails] strip html tags?

Hi
I have text that I get from the  user that is stored in the database 
after escaping the html.

I want to display this text in the view with the markup (this is easy), 
but I also want to display it in a alt_tag of an image where I would 
like all markup stripped out.

I''m hoping someone can point me in the direction of an existing
function
or helper method so I don''t have to reinvent the wheel.

Thanks in advance,

Francois
>

Bob Silva

2006-Jan-27 09:22 UTC

head link

[Rails] strip html tags?

http://railsmanual.org/module/ActionView::Helpers::TextHelper/strip_tags


Bob Silva
http://www.railtie.net/
> -----Original Message-----
> From: rails-bounces@lists.rubyonrails.org [mailto:rails-
> bounces@lists.rubyonrails.org] On Behalf Of Francois Paul
> Sent: Friday, January 27, 2006 1:12 AM
> To: rails@lists.rubyonrails.org
> Subject: [Rails] strip html tags?
> 
> 
> Hi
> I have text that I get from the  user that is stored in the database
> after escaping the html.
> 
> I want to display this text in the view with the markup (this is easy),
> but I also want to display it in a alt_tag of an image where I would
> like all markup stripped out.
> 
> I''m hoping someone can point me in the direction of an existing
function
> or helper method so I don''t have to reinvent the wheel.
> 
> Thanks in advance,
> 
> Francois
> 
> >
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails

Onur Turgay

2006-Jan-27 10:03 UTC

head link

[Rails] strip html tags?

redcloth''s html filter is very capable. you can strip all html tags,
or define which tags and attributes (like alt, src etc.) can remain.
but its''a private redcloth function. so either you will make it static
public or use redcloth filters. or just use the fragment below that I
extracted from redcloth. I think its self explanatory. the tags in
basic tags hash will be kept, all others will be removed.

(this is an extension to string method)

class String

BASIC_TAGS = {
        ''a'' => [''href'',
''title''],
        ''img'' => [''src'',
''alt'',
''title'',''align'',''width'',''height'',''border'',''class''],
        ''br'' => [],
        ''i'' => nil,
        ''u'' => nil,
        ''b'' => nil,
        ''pre'' => nil,
        ''kbd'' => nil,
        ''code'' => [''lang''],
        ''cite'' => nil,
        ''strong'' => nil,
        ''em'' => nil,
        ''ins'' => nil,
        ''sup'' => nil,
        ''sub'' => nil,
        ''del'' => nil,
        ''table'' => nil,
        ''tr'' => nil,
        ''td'' => [''colspan'',
''rowspan''],
        ''th'' => nil,
        ''ol'' => nil,
        ''ul'' => nil,
        ''li'' => nil,
        ''p'' => nil,
        ''h1'' => nil,
        ''h2'' => nil,
        ''h3'' => nil,
        ''h4'' => nil,
        ''h5'' => nil,
        ''h6'' => nil,
        ''blockquote'' => [''cite'']
    }

    def self.clean_html!( text, tags = BASIC_TAGS )
        text.gsub!( /<!\[CDATA\[/, '''' )
        text.gsub!( /<(\/*)(\w+)([^>]*)>/ ) do
            raw = $~
            tag = raw[2].downcase
            if tags.has_key? tag
                pcs = [tag]
                pcs << "rel=\"nofollow\"" if
tag==''a''
                tags[tag].each do |prop|
                    [''"'', "''",
''''].each do |q|
                        q2 = ( q != '''' ? q :
''\s'' )
                        if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
                            attrv = $1
                            next if tag!=''img'' and prop ==
''src'' and
attrv !~ /^http/
                            pcs <<
"#{prop}=\"#{$1.gsub(''"'',
''\\"'')}\""
                            break
                        end
                    end
                end if tags[tag]
                "<#{raw[1]}#{pcs.join " "}>"
            else
                " "
            end
        end
    end

    def self.clean_html( text, tags = BASIC_TAGS)
      str = text.dup
      clean_html!(str,tags)
      str
    end

    def clean_html( text, tags = BASIC_TAGS )
      self.class.clean_html!(text,tags)
    end
end

On 1/27/06, Francois Paul <francois@bagasie.com>
wrote:>
> Hi
> I have text that I get from the  user that is stored in the database
> after escaping the html.
>
> I want to display this text in the view with the markup (this is easy),
> but I also want to display it in a alt_tag of an image where I would
> like all markup stripped out.
>
> I''m hoping someone can point me in the direction of an existing
function
> or helper method so I don''t have to reinvent the wheel.
>
> Thanks in advance,
>
> Francois
>
> >
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Ben Munat

2006-Jan-27 16:36 UTC

head link

[Rails] html special characters. h() failure.

Yeah, someone posted yesterday that html_escape only replaces "<",
">", and "&". I
couldn''t believe that but went and verified it in the ERB sourcecode.
Seems a might bit
naive to me.... it doesn''t even replace quotes (note to self: never use
ERB to replace
attribute values).

Anyway, the html_escape method is just a chained gsub... you could just override
that and
add a bunch more chars to the chain... and then share it with us all! ;-)

b

Neil Dugan wrote:> I was trying to convert a some text with the (r) character it so it
> replaced character \xAE with &reg;
> 
> h(@item.description) didn''t do anything.  I need to use
> @item.description.grep(/\xAE/,''&reg;'') for it to
work.
> 
> I think the h() function should be able to do all the codes that are
> available.
> 
> Regards Neil.
> 
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails

Francois Beausoleil

2006-Jan-27 23:09 UTC

head link

[Rails] html special characters. h() failure.

Hi !

2006/1/27, Ben Munat <bent@munat.com>:> Anyway, the html_escape method is just a chained gsub... you could just
override that and
> add a bunch more chars to the chain... and then share it with us all! ;-)
Hmm, that would be a bad idea.  The purpose of html_escape is to
ESCAPE bad characters, not do translations.  If you want that, look
into the textilize helper method, or textilize_without_paragraph.

Hope that helps,
Fran?ois

Ben Munat

2006-Jan-27 23:38 UTC

head link

[Rails] html special characters. h() failure.

Francois Beausoleil wrote:> 2006/1/27, Ben Munat <bent@munat.com>:
>>Anyway, the html_escape method is just a chained gsub... you could just
override that and
>>add a bunch more chars to the chain... and then share it with us all!
;-)
> 
> Hmm, that would be a bad idea.  The purpose of html_escape is to
> ESCAPE bad characters, not do translations.  If you want that, look
> into the textilize helper method, or textilize_without_paragraph.
> 
I don''t follow you. I''m not talking about
"translations". I''m saying that there are a bunch more
potentially "bad"
characters than just gt, lt, and amp. The purpose of the html_escape method is
to *escape* any characters in the input
text to their appropriate x/html versions.

I''m simply saying that whoever wrote that method should be *at least*
escaping quotes... and probably apostrophes. Most
everything else one could live without, but as the OP pointed out, it would be
nice to have another version of (or an
option passed to) html_escape to do things like copyright (c), registered (r),
etc. That might me more
textilize-territory, but well, we''d probably need to get into wrassling
mode then. (that''s Amurican for we''d need to
argue some more)

For that matter, I would propose that the html_escape method be removed.
Instead, the default behavior of ERB should be
to replace any and all potentially problematic characters with the appropriate
entities. If, for some reason, the user
does not desire this, then they should use something like a
"no_escape" ("no"??) method to override the default
escaping. It would also be a good to have a "override for this file"
method so that you can just turn it off for e.g.
email templates.

I find it very amusing that the agile book counsels that you should almost
always have that "h" in your erb outs... it''s
easy to miss... make sure you don''t forget it! Doesn''t sound
particuarly DRY to me.

But actually, I''m thinking I don''t really want to stick with
ERB too long anyway... templating is so nineties... I''m
planning on spending some quality time with rexml, markaby, and xx.

b

Alex Young

2006-Jan-28 17:36 UTC

head link

[Rails] html special characters. h() failure.

Ben Munat wrote:> I don''t follow you. I''m not talking about
"translations". I''m saying
>  that there are a bunch more potentially "bad" characters than
just
> gt, lt, and amp.No there aren''t - the only other potentially bad character is
&quot;,
and that''s only ever (potentially) a problem in attribute values.  If
you''re having problems with *any other* character, there''s a
problem
with character set mismatches somewhere in your application.
> The purpose of the html_escape method is to *escape* any characters
> in the input text to their appropriate x/html versions.Which it does, with the arguable exception of &quot;.

Think about what would be needed for it to do any more than it does.  In
order to be able to translate any of the other characters meaningfully
to the HTML escaped equivalent, you need to know which character set
you''re coming from, so you need to do a conversion to an unambiguous
base set anyway.  For example:  &Aacute; is the capital A acute letter.
In latin1, it''s 0xC1.  In UTF-8, it''s 0xC381.  If you thought
you were
in latin1, but your data was actually utf-8, you''d end up with the 
rather nice sequence &Atilde;Q.  You could hypothetically do:

def new_html_escape(str, charset)
   h( Iconv.iconv(str, ''utf-8'', charset))
end

But if you''ve got enough information to make that work, why not just
arrange for the data to be in the right character set in the first
place, and avoid overcomplicating what only needs to be a simple method?
> Instead, the default behavior of ERB should be to replace any and all
> potentially problematic characters with the appropriate entities. If,
> for some reason, the user does not desire this, then they should use
> something like a "no_escape" ("no"??) method to
override the default
> escaping.
Just...  no.  There are just as many cases where you *don''t* want 
escaping to happen as those where you do.  Think of all those <%= render 
:partial => ... %> and <%= link_to ... %> that you''d have
to turn
escaping off for.  Just as non-DRY.

-- 
Alex

Ben Munat

2006-Jan-28 18:00 UTC

head link

[Rails] html special characters. h() failure.

Ok, you make valid points... I take it all back... except that html_escape
should do
&quot; too. We agree on that. :-) And actually, I think &apos; would be
good too, since
that is a valid char for enclosing attributes.

b

Alex Young wrote:> Ben Munat wrote:
> 
>> I don''t follow you. I''m not talking about
"translations". I''m saying
>>  that there are a bunch more potentially "bad" characters
than just
>> gt, lt, and amp.
> 
> No there aren''t - the only other potentially bad character is
&quot;,
> and that''s only ever (potentially) a problem in attribute values. 
If
> you''re having problems with *any other* character,
there''s a problem
> with character set mismatches somewhere in your application.
> 
>> The purpose of the html_escape method is to *escape* any characters
>> in the input text to their appropriate x/html versions.
> 
> Which it does, with the arguable exception of &quot;.
> 
> Think about what would be needed for it to do any more than it does.  In
> order to be able to translate any of the other characters meaningfully
> to the HTML escaped equivalent, you need to know which character set
> you''re coming from, so you need to do a conversion to an
unambiguous
> base set anyway.  For example:  &Aacute; is the capital A acute letter.
> In latin1, it''s 0xC1.  In UTF-8, it''s 0xC381.  If you
thought you were
> in latin1, but your data was actually utf-8, you''d end up with the
> rather nice sequence &Atilde;Q.  You could hypothetically do:
> 
> def new_html_escape(str, charset)
>   h( Iconv.iconv(str, ''utf-8'', charset))
> end
> 
> But if you''ve got enough information to make that work, why not
just
> arrange for the data to be in the right character set in the first
> place, and avoid overcomplicating what only needs to be a simple method?
> 
>> Instead, the default behavior of ERB should be to replace any and all
>> potentially problematic characters with the appropriate entities. If,
>> for some reason, the user does not desire this, then they should use
>> something like a "no_escape" ("no"??) method to
override the default
>> escaping.
> 
> 
> Just...  no.  There are just as many cases where you *don''t* want 
> escaping to happen as those where you do.  Think of all those <%= render
> :partial => ... %> and <%= link_to ... %> that you''d
have to turn
> escaping off for.  Just as non-DRY.
>

Philip Ross

2006-Jan-28 23:10 UTC

head link

[Rails] Re: html special characters. h() failure.

Ben Munat wrote:> Yeah, someone posted yesterday that html_escape only replaces
"<", ">",
> and "&". I couldn''t believe that but went and
verified it in the ERB
> sourcecode. Seems a might bit naive to me.... it doesn''t even
replace
> quotes (note to self: never use ERB to replace attribute values).
Which version of ERB are you looking at? My copy (Ruby 1.8.2) does 
replace quotes:

def html_escape(s)
   s.to_s.gsub(/&/, "&amp;").gsub(/\"/,
"&quot;").
     gsub(/>/, "&gt;").gsub(/</, "&lt;")
end

According to the Ruby CVS [1], html_escape has been unchanged for over 
three years.

   1. http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/lib/erb.rb

Phil

-- 
Philip Ross
http://tzinfo.rubyforge.org/ -- DST-aware timezone library for Ruby

Ben Munat

2006-Jan-29 00:06 UTC

head link

[Rails] Re: html special characters. h() failure.

OMFG.... I looked right at that but the gsub(/\"/, "&quot;")
bit was temporarily
invisible... I plead brain damage... I''m just going to crawl back in my
hole now...

:-#

b


Philip Ross wrote:> Ben Munat wrote:
> 
>> Yeah, someone posted yesterday that html_escape only replaces
"<",
>> ">", and "&". I couldn''t believe
that but went and verified it in the
>> ERB sourcecode. Seems a might bit naive to me.... it doesn''t
even
>> replace quotes (note to self: never use ERB to replace attribute
values).
> 
> 
> Which version of ERB are you looking at? My copy (Ruby 1.8.2) does 
> replace quotes:
> 
> def html_escape(s)
>   s.to_s.gsub(/&/, "&amp;").gsub(/\"/,
"&quot;").
>     gsub(/>/, "&gt;").gsub(/</, "&lt;")
> end
> 
> According to the Ruby CVS [1], html_escape has been unchanged for over 
> three years.
> 
>   1. http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/lib/erb.rb
> 
> Phil
>

Apparently Analagous Threads

Search for more apparently analagous threads

Rails - Jan 2006 - html special characters. h() failure.

[Rails] html special characters. h() failure.

[Rails] strip html tags?

[Rails] strip html tags?

[Rails] strip html tags?

[Rails] html special characters. h() failure.

[Rails] html special characters. h() failure.

[Rails] html special characters. h() failure.

[Rails] html special characters. h() failure.

[Rails] html special characters. h() failure.

[Rails] Re: html special characters. h() failure.

[Rails] Re: html special characters. h() failure.

Apparently Analagous Threads