I was trying to convert a some text with the (r) character it so it replaced character \xAE with ® h(@item.description) didn''t do anything. I need to use @item.description.grep(/\xAE/,''®'') for it to work. I think the h() function should be able to do all the codes that are available. Regards Neil.
Hi I have text that I get from the user that is stored in the database after escaping the html. I want to display this text in the view with the markup (this is easy), but I also want to display it in a alt_tag of an image where I would like all markup stripped out. I''m hoping someone can point me in the direction of an existing function or helper method so I don''t have to reinvent the wheel. Thanks in advance, Francois>
http://railsmanual.org/module/ActionView::Helpers::TextHelper/strip_tags Bob Silva http://www.railtie.net/> -----Original Message----- > From: rails-bounces@lists.rubyonrails.org [mailto:rails- > bounces@lists.rubyonrails.org] On Behalf Of Francois Paul > Sent: Friday, January 27, 2006 1:12 AM > To: rails@lists.rubyonrails.org > Subject: [Rails] strip html tags? > > > Hi > I have text that I get from the user that is stored in the database > after escaping the html. > > I want to display this text in the view with the markup (this is easy), > but I also want to display it in a alt_tag of an image where I would > like all markup stripped out. > > I''m hoping someone can point me in the direction of an existing function > or helper method so I don''t have to reinvent the wheel. > > Thanks in advance, > > Francois > > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails
redcloth''s html filter is very capable. you can strip all html tags, or define which tags and attributes (like alt, src etc.) can remain. but its''a private redcloth function. so either you will make it static public or use redcloth filters. or just use the fragment below that I extracted from redcloth. I think its self explanatory. the tags in basic tags hash will be kept, all others will be removed. (this is an extension to string method) class String BASIC_TAGS = { ''a'' => [''href'', ''title''], ''img'' => [''src'', ''alt'', ''title'',''align'',''width'',''height'',''border'',''class''], ''br'' => [], ''i'' => nil, ''u'' => nil, ''b'' => nil, ''pre'' => nil, ''kbd'' => nil, ''code'' => [''lang''], ''cite'' => nil, ''strong'' => nil, ''em'' => nil, ''ins'' => nil, ''sup'' => nil, ''sub'' => nil, ''del'' => nil, ''table'' => nil, ''tr'' => nil, ''td'' => [''colspan'', ''rowspan''], ''th'' => nil, ''ol'' => nil, ''ul'' => nil, ''li'' => nil, ''p'' => nil, ''h1'' => nil, ''h2'' => nil, ''h3'' => nil, ''h4'' => nil, ''h5'' => nil, ''h6'' => nil, ''blockquote'' => [''cite''] } def self.clean_html!( text, tags = BASIC_TAGS ) text.gsub!( /<!\[CDATA\[/, '''' ) text.gsub!( /<(\/*)(\w+)([^>]*)>/ ) do raw = $~ tag = raw[2].downcase if tags.has_key? tag pcs = [tag] pcs << "rel=\"nofollow\"" if tag==''a'' tags[tag].each do |prop| [''"'', "''", ''''].each do |q| q2 = ( q != '''' ? q : ''\s'' ) if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i attrv = $1 next if tag!=''img'' and prop == ''src'' and attrv !~ /^http/ pcs << "#{prop}=\"#{$1.gsub(''"'', ''\\"'')}\"" break end end end if tags[tag] "<#{raw[1]}#{pcs.join " "}>" else " " end end end def self.clean_html( text, tags = BASIC_TAGS) str = text.dup clean_html!(str,tags) str end def clean_html( text, tags = BASIC_TAGS ) self.class.clean_html!(text,tags) end end On 1/27/06, Francois Paul <francois@bagasie.com> wrote:> > Hi > I have text that I get from the user that is stored in the database > after escaping the html. > > I want to display this text in the view with the markup (this is easy), > but I also want to display it in a alt_tag of an image where I would > like all markup stripped out. > > I''m hoping someone can point me in the direction of an existing function > or helper method so I don''t have to reinvent the wheel. > > Thanks in advance, > > Francois > > > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
Yeah, someone posted yesterday that html_escape only replaces "<", ">", and "&". I couldn''t believe that but went and verified it in the ERB sourcecode. Seems a might bit naive to me.... it doesn''t even replace quotes (note to self: never use ERB to replace attribute values). Anyway, the html_escape method is just a chained gsub... you could just override that and add a bunch more chars to the chain... and then share it with us all! ;-) b Neil Dugan wrote:> I was trying to convert a some text with the (r) character it so it > replaced character \xAE with ® > > h(@item.description) didn''t do anything. I need to use > @item.description.grep(/\xAE/,''®'') for it to work. > > I think the h() function should be able to do all the codes that are > available. > > Regards Neil. > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails
Hi ! 2006/1/27, Ben Munat <bent@munat.com>:> Anyway, the html_escape method is just a chained gsub... you could just override that and > add a bunch more chars to the chain... and then share it with us all! ;-)Hmm, that would be a bad idea. The purpose of html_escape is to ESCAPE bad characters, not do translations. If you want that, look into the textilize helper method, or textilize_without_paragraph. Hope that helps, Fran?ois
Francois Beausoleil wrote:> 2006/1/27, Ben Munat <bent@munat.com>: >>Anyway, the html_escape method is just a chained gsub... you could just override that and >>add a bunch more chars to the chain... and then share it with us all! ;-) > > Hmm, that would be a bad idea. The purpose of html_escape is to > ESCAPE bad characters, not do translations. If you want that, look > into the textilize helper method, or textilize_without_paragraph. >I don''t follow you. I''m not talking about "translations". I''m saying that there are a bunch more potentially "bad" characters than just gt, lt, and amp. The purpose of the html_escape method is to *escape* any characters in the input text to their appropriate x/html versions. I''m simply saying that whoever wrote that method should be *at least* escaping quotes... and probably apostrophes. Most everything else one could live without, but as the OP pointed out, it would be nice to have another version of (or an option passed to) html_escape to do things like copyright (c), registered (r), etc. That might me more textilize-territory, but well, we''d probably need to get into wrassling mode then. (that''s Amurican for we''d need to argue some more) For that matter, I would propose that the html_escape method be removed. Instead, the default behavior of ERB should be to replace any and all potentially problematic characters with the appropriate entities. If, for some reason, the user does not desire this, then they should use something like a "no_escape" ("no"??) method to override the default escaping. It would also be a good to have a "override for this file" method so that you can just turn it off for e.g. email templates. I find it very amusing that the agile book counsels that you should almost always have that "h" in your erb outs... it''s easy to miss... make sure you don''t forget it! Doesn''t sound particuarly DRY to me. But actually, I''m thinking I don''t really want to stick with ERB too long anyway... templating is so nineties... I''m planning on spending some quality time with rexml, markaby, and xx. b
Ben Munat wrote:> I don''t follow you. I''m not talking about "translations". I''m saying > that there are a bunch more potentially "bad" characters than just > gt, lt, and amp.No there aren''t - the only other potentially bad character is ", and that''s only ever (potentially) a problem in attribute values. If you''re having problems with *any other* character, there''s a problem with character set mismatches somewhere in your application.> The purpose of the html_escape method is to *escape* any characters > in the input text to their appropriate x/html versions.Which it does, with the arguable exception of ". Think about what would be needed for it to do any more than it does. In order to be able to translate any of the other characters meaningfully to the HTML escaped equivalent, you need to know which character set you''re coming from, so you need to do a conversion to an unambiguous base set anyway. For example: Á is the capital A acute letter. In latin1, it''s 0xC1. In UTF-8, it''s 0xC381. If you thought you were in latin1, but your data was actually utf-8, you''d end up with the rather nice sequence ÃQ. You could hypothetically do: def new_html_escape(str, charset) h( Iconv.iconv(str, ''utf-8'', charset)) end But if you''ve got enough information to make that work, why not just arrange for the data to be in the right character set in the first place, and avoid overcomplicating what only needs to be a simple method?> Instead, the default behavior of ERB should be to replace any and all > potentially problematic characters with the appropriate entities. If, > for some reason, the user does not desire this, then they should use > something like a "no_escape" ("no"??) method to override the default > escaping.Just... no. There are just as many cases where you *don''t* want escaping to happen as those where you do. Think of all those <%= render :partial => ... %> and <%= link_to ... %> that you''d have to turn escaping off for. Just as non-DRY. -- Alex
Ok, you make valid points... I take it all back... except that html_escape should do " too. We agree on that. :-) And actually, I think ' would be good too, since that is a valid char for enclosing attributes. b Alex Young wrote:> Ben Munat wrote: > >> I don''t follow you. I''m not talking about "translations". I''m saying >> that there are a bunch more potentially "bad" characters than just >> gt, lt, and amp. > > No there aren''t - the only other potentially bad character is ", > and that''s only ever (potentially) a problem in attribute values. If > you''re having problems with *any other* character, there''s a problem > with character set mismatches somewhere in your application. > >> The purpose of the html_escape method is to *escape* any characters >> in the input text to their appropriate x/html versions. > > Which it does, with the arguable exception of ". > > Think about what would be needed for it to do any more than it does. In > order to be able to translate any of the other characters meaningfully > to the HTML escaped equivalent, you need to know which character set > you''re coming from, so you need to do a conversion to an unambiguous > base set anyway. For example: Á is the capital A acute letter. > In latin1, it''s 0xC1. In UTF-8, it''s 0xC381. If you thought you were > in latin1, but your data was actually utf-8, you''d end up with the > rather nice sequence ÃQ. You could hypothetically do: > > def new_html_escape(str, charset) > h( Iconv.iconv(str, ''utf-8'', charset)) > end > > But if you''ve got enough information to make that work, why not just > arrange for the data to be in the right character set in the first > place, and avoid overcomplicating what only needs to be a simple method? > >> Instead, the default behavior of ERB should be to replace any and all >> potentially problematic characters with the appropriate entities. If, >> for some reason, the user does not desire this, then they should use >> something like a "no_escape" ("no"??) method to override the default >> escaping. > > > Just... no. There are just as many cases where you *don''t* want > escaping to happen as those where you do. Think of all those <%= render > :partial => ... %> and <%= link_to ... %> that you''d have to turn > escaping off for. Just as non-DRY. >
Ben Munat wrote:> Yeah, someone posted yesterday that html_escape only replaces "<", ">", > and "&". I couldn''t believe that but went and verified it in the ERB > sourcecode. Seems a might bit naive to me.... it doesn''t even replace > quotes (note to self: never use ERB to replace attribute values).Which version of ERB are you looking at? My copy (Ruby 1.8.2) does replace quotes: def html_escape(s) s.to_s.gsub(/&/, "&").gsub(/\"/, """). gsub(/>/, ">").gsub(/</, "<") end According to the Ruby CVS [1], html_escape has been unchanged for over three years. 1. http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/lib/erb.rb Phil -- Philip Ross http://tzinfo.rubyforge.org/ -- DST-aware timezone library for Ruby
OMFG.... I looked right at that but the gsub(/\"/, """) bit was temporarily invisible... I plead brain damage... I''m just going to crawl back in my hole now... :-# b Philip Ross wrote:> Ben Munat wrote: > >> Yeah, someone posted yesterday that html_escape only replaces "<", >> ">", and "&". I couldn''t believe that but went and verified it in the >> ERB sourcecode. Seems a might bit naive to me.... it doesn''t even >> replace quotes (note to self: never use ERB to replace attribute values). > > > Which version of ERB are you looking at? My copy (Ruby 1.8.2) does > replace quotes: > > def html_escape(s) > s.to_s.gsub(/&/, "&").gsub(/\"/, """). > gsub(/>/, ">").gsub(/</, "<") > end > > According to the Ruby CVS [1], html_escape has been unchanged for over > three years. > > 1. http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/lib/erb.rb > > Phil >