thr3ads.net - Markdown Discuss - converting html with \xa9 to Markdown and using iconv? [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Jeremy C. Reed

2007-Mar-22 20:52 UTC

converting html with \xa9 to Markdown and using iconv?

The html document various characters like
?	\xa0
?	\xa9  (Copyright symbol)
(and others).

I tried using html2text.py but it didn't like these characters.

Any ideas on how I can use iconv or another tool to convert documents like 
this so I can then convert to Markdown?

I don't want to do manually as I have around 500+ documents.


  Jeremy C. Reed

Milian Wolff

2007-Mar-22 21:03 UTC

head link

converting html with \xa9 to Markdown and using iconv?

Am Donnerstag, 22. M?rz 2007 schrieb Jeremy C. Reed:> The html document various characters like
> ?	\xa0
> ?	\xa9  (Copyright symbol)
> (and others).
>
> I tried using html2text.py but it didn't like these characters.
>
> Any ideas on how I can use iconv or another tool to convert documents like
> this so I can then convert to Markdown?
>
> I don't want to do manually as I have around 500+ documents.
>
>
>   Jeremy C. Reed
As far as I understand you, you are looking for a converter which supports 
UTF-8 / Unicode characters?

My PHP-script (ported from html2text.py) doesn't change those, so it would 
theoretically work. Try it out at [1].

But: It's PHP - so unless you have access to a command line or write a
little
PHP script to be run locally it will be of no use for you. The latter should 
be pretty easy though, simply recourse through your files / folders, apply 
html2text to all and save the output somewhere. You might want to allow 
long(er) execution times for PHP scripts for the meantime.

Another alternative would be to use one of the other converters, I know there 
are some but I don't have their URLs at hand. Maybe someone will be able to 
help you.

 [1]: http://milianw.de/projects/html2text/

-- 
Milian Wolff
http://milianw.de

John MacFarlane

2007-Mar-22 23:00 UTC

head link

converting html with \xa9 to Markdown and using iconv?

You could try html2markdown, which uses iconv, tidy, and pandoc.
It should have no trouble with these characters. It's included in the
pandoc distribution: http://sophos.berkeley.edu/macfarlane/pandoc/

JM

+++ Jeremy C. Reed [Mar 22 07 15:52 ]:> The html document various characters like
> ?	\xa0
> ?	\xa9  (Copyright symbol)
> (and others).
> 
> I tried using html2text.py but it didn't like these characters.
> 
> Any ideas on how I can use iconv or another tool to convert documents like 
> this so I can then convert to Markdown?
> 
> I don't want to do manually as I have around 500+ documents.
> 
> 
>   Jeremy C. Reed
> _______________________________________________
> Markdown-Discuss mailing list
> Markdown-Discuss at six.pairlist.net
> http://six.pairlist.net/mailman/listinfo/markdown-discuss

Julian Tarkhanov

2007-Mar-23 14:43 UTC

head link

converting html with \xa9 to Markdown and using iconv?

On Mar 22, 2007, at 9:52 PM, Jeremy C. Reed wrote:
> I tried using html2text.py but it didn't like these characters.
>
> Any ideas on how I can use iconv or another tool to convert  
> documents like
> this so I can then convert to Markdown?
This might have to do with Python and it's silly-silly unicode strings.

I have done hundreds of docs with all kinds of weird characters in  
them, in many languages, as long as they were UTF-8 the
Perl markdown, the PHP one and the Ruby one all worked fine. You got  
mucho problemo if you use some odd 8-bit legacy encoding for your  
special
chars.

Besides, you don't actually need  to convert them to entities -  
normal browsers just render these Unicode chars verbatim if your font  
has them.

-- 
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl

andrew at shedside.com

2007-Mar-23 15:22 UTC

head link

converting html with \xa9 to Markdown and using iconv?

On Thu, 22 Mar 2007 15:52:01 -0500 (CDT), Jeremy C. Reed wrote:
> I tried using html2text.py but it didn't like these characters.
If you're familiar with XSLT, another option is:

	http://www.lowerelement.com/Geekery/XML/XHTML-to-Markdown.html

(You can use Perl's XML::LibXML and XML::LibXSLT to parse your HTML as 
XHTML and then transform it using the above stylesheet, UTF8 intact, 
into Markdown).

Cheers,
Andrew.

Seemingly Similar Threads

Search for more reasonably related threads

Markdown Discuss - Mar 2007 - converting html with \xa9 to Markdown and using iconv?

converting html with \xa9 to Markdown and using iconv?

converting html with \xa9 to Markdown and using iconv?

converting html with \xa9 to Markdown and using iconv?

converting html with \xa9 to Markdown and using iconv?

converting html with \xa9 to Markdown and using iconv?

Seemingly Similar Threads