Wes Gamble
2006-May-24 22:16 UTC
[Rails] HTMLEntities.decode_entities - problems with output
All, I am trying to use the HTMLEntities library to translate HTML entities into their character equivalents so that I can print a text version of some HTML to a file. However, I am having trouble understanding how to successfully emit the converted text as a string without ending up with weird UTF-8 characters in front of the converted characters. Referencing the irb session below, I''m attempting to do the Iconv conversion from ASCII to UTF-8 because I''m assuming that I have to in order to use the HTMLEntities calls. I am trying to convert back to ISO-8859-1 because I''m assuming that I need to. If I just try to print the output of HTMLEntities.decode_entities to a file without doing any iconv conversions, I get A-circumflex before every modified character. A-circumflex is the ISO-8859-1 equivalent of \302. What am I missing here? How can I successfully display as a space in a file that I am writing to? I''d rather not have to my own gsubs on each character entity, although I am prepared to do that. I also thought of substituting the \302 character with '''' (if I could only figure out how to do that). Any help is appreciated, Wes ============================================================================= C:\eclipse\workspace>irb irb(main):001:0> require ''iconv'' => true irb(main):002:0> require ''rubygems'' => false irb(main):003:0> require ''HTMLEntities'' => true irb(main):004:0> x = '' xyz'' => " xyz" irb(main):006:0> conv = Iconv.new("ASCII", "UTF-8") => #<Iconv:0x2c297f8> irb(main):007:0> y = conv.iconv(x) => " xyz" irb(main):008:0> HTMLEntities.decode_entities(y) => "\302\240xyz" irb(main):009:0> conv = Iconv.new("UTF-8", "ISO-8859-1") => #<Iconv:0x2c12ed8> irb(main):010:0> conv.iconv(y) => " xyz" -- Posted via http://www.ruby-forum.com/.