thr3ads.net - Markdown Discuss - Detab should be multi-byte aware? [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Allan Odgaard

2006-Oct-09 17:02 UTC

Detab should be multi-byte aware?

A user has table-formatted data which contains accents and finds it  
problematic that his tables misalign after going through Markdown.

This is because he made them align using tab characters and Markdown  
will convert these to spaces even in pre-formatted text and Markdown  
is not multi-byte aware.

This raises two questions:

  1. Should Markdown convert tabs to spaces in pre-formated text?
  2. If yes, should Markdown be aware of multi-byte characters?

I?d say yes to #1 -- Markdown converts to (X)HTML which does not  
define the tab size, and a good rule of thumb is to always convert to  
spaces before publishing on the net.

As for #2, Markdown doesn?t know the encoding of the source document,  
so that would mean it can?t really be aware of things such as UTF-8  
mb sequences, OTOH if it changes my pre-formatted text, I would like  
to have it do the right thing.

John Gruber

2006-Oct-09 18:19 UTC

head link

Detab should be multi-byte aware?

Allan Odgaard <29mtuz102@sneakemail.com> wrote on 10/9/06 at 
11:02 PM:
> This raises two questions:
>    1. Should Markdown convert tabs to spaces in pre-formated text?
>  2. If yes, should Markdown be aware of multi-byte characters?
>   I?d say yes to #1 -- Markdown converts to (X)HTML which 
>does not define the  tab size, and a good rule of thumb is to 
>always convert to spaces before  publishing on the net.
For #1, that's exactly why it does it.

> As for #2, Markdown doesn?t know the encoding of the source
> document, so that would mean it can?t really be aware of
> things such as UTF-8 mb sequences, OTOH if it changes my
> pre-formatted text, I would like to have it do the right thing.
If Markdown.pl ever gains explicit support for text encodings, the
rules will be simple: UTF-8 in, UTF-8 out, no exceptions.

This would break the way some people are using it, I'm sure. I
don't really have much sympathy for people who are clinging to
other encodings, though.

I don't think the rules for the syntax (as opposed to the
implementation) need to mention it, though, at least not yet.

I say "yet" because from the get-go I've always considered using
non-ASCII punctuation characters for certain features.

I don't think there's any reason that someone couldn't write a
UTF-8 savvy Markdown implementation using the 1.0 syntax, though.

-J.G.

Michel Fortin

2006-Oct-09 18:52 UTC

head link

Detab should be multi-byte aware?

Le 9 oct. 2006 ? 17:02, Allan Odgaard a ?crit :
> As for #2, Markdown doesn?t know the encoding of the source  
> document, so that would mean it can?t really be aware of things  
> such as UTF-8 mb sequences, OTOH if it changes my pre-formatted  
> text, I would like to have it do the right thing.
Currently, Markdown.pl and my own PHP implementation of Markdown both  
support any superset of ASCII; that includes UTF-8. UTF-8 multi-byte  
sequences have the interesting property of being entirely composed of  
bytes above 127, over ASCII range. So while Markdown isn't really  
"aware" of these multi-byte sequences in the sense that it treats  
them as one character, it isn't changing them into anything either.

 From your description of the problem, I believe you're not using UTF-8.


Michel Fortin
michel.fortin@michelf.com
michelf.com

Apparently Analagous Threads

Search for more possibly parallel threads

Markdown Discuss - Oct 2006 - Detab should be multi-byte aware?

Detab should be multi-byte aware?

Detab should be multi-byte aware?

Detab should be multi-byte aware?

Apparently Analagous Threads