Wes Gamble
2006-May-24 22:16 UTC
[Rails] HTMLEntities.decode_entities - problems with output
All,
I am trying to use the HTMLEntities library to translate HTML entities
into their character equivalents so that I can print a text version of
some HTML to a file.
However, I am having trouble understanding how to successfully emit the
converted text as a string without ending up with weird UTF-8 characters
in front of the converted characters.
Referencing the irb session below, I''m attempting to do the Iconv
conversion from ASCII to UTF-8 because I''m assuming that I have to in
order to use the HTMLEntities calls. I am trying to convert back to
ISO-8859-1 because I''m assuming that I need to.
If I just try to print the output of HTMLEntities.decode_entities to a
file without doing any iconv conversions, I get A-circumflex before
every modified character. A-circumflex is the ISO-8859-1 equivalent of
\302.
What am I missing here? How can I successfully display as a
space in a file that I am writing to? I''d rather not have to my own
gsubs on each character entity, although I am prepared to do that.
I also thought of substituting the \302 character with '''' (if
I could
only figure out how to do that).
Any help is appreciated,
Wes
=============================================================================
C:\eclipse\workspace>irb
irb(main):001:0> require ''iconv''
=> true
irb(main):002:0> require ''rubygems''
=> false
irb(main):003:0> require ''HTMLEntities''
=> true
irb(main):004:0> x = '' xyz''
=> " xyz"
irb(main):006:0> conv = Iconv.new("ASCII", "UTF-8")
=> #<Iconv:0x2c297f8>
irb(main):007:0> y = conv.iconv(x)
=> " xyz"
irb(main):008:0> HTMLEntities.decode_entities(y)
=> "\302\240xyz"
irb(main):009:0> conv = Iconv.new("UTF-8", "ISO-8859-1")
=> #<Iconv:0x2c12ed8>
irb(main):010:0> conv.iconv(y)
=> " xyz"
--
Posted via http://www.ruby-forum.com/.
