A user has table-formatted data which contains accents and finds it problematic that his tables misalign after going through Markdown. This is because he made them align using tab characters and Markdown will convert these to spaces even in pre-formatted text and Markdown is not multi-byte aware. This raises two questions: 1. Should Markdown convert tabs to spaces in pre-formated text? 2. If yes, should Markdown be aware of multi-byte characters? I?d say yes to #1 -- Markdown converts to (X)HTML which does not define the tab size, and a good rule of thumb is to always convert to spaces before publishing on the net. As for #2, Markdown doesn?t know the encoding of the source document, so that would mean it can?t really be aware of things such as UTF-8 mb sequences, OTOH if it changes my pre-formatted text, I would like to have it do the right thing.
Allan Odgaard <29mtuz102@sneakemail.com> wrote on 10/9/06 at 11:02 PM:> This raises two questions: > 1. Should Markdown convert tabs to spaces in pre-formated text? > 2. If yes, should Markdown be aware of multi-byte characters? > I?d say yes to #1 -- Markdown converts to (X)HTML which >does not define the tab size, and a good rule of thumb is to >always convert to spaces before publishing on the net.For #1, that's exactly why it does it.> As for #2, Markdown doesn?t know the encoding of the source > document, so that would mean it can?t really be aware of > things such as UTF-8 mb sequences, OTOH if it changes my > pre-formatted text, I would like to have it do the right thing.If Markdown.pl ever gains explicit support for text encodings, the rules will be simple: UTF-8 in, UTF-8 out, no exceptions. This would break the way some people are using it, I'm sure. I don't really have much sympathy for people who are clinging to other encodings, though. I don't think the rules for the syntax (as opposed to the implementation) need to mention it, though, at least not yet. I say "yet" because from the get-go I've always considered using non-ASCII punctuation characters for certain features. I don't think there's any reason that someone couldn't write a UTF-8 savvy Markdown implementation using the 1.0 syntax, though. -J.G.
Le 9 oct. 2006 ? 17:02, Allan Odgaard a ?crit :> As for #2, Markdown doesn?t know the encoding of the source > document, so that would mean it can?t really be aware of things > such as UTF-8 mb sequences, OTOH if it changes my pre-formatted > text, I would like to have it do the right thing.Currently, Markdown.pl and my own PHP implementation of Markdown both support any superset of ASCII; that includes UTF-8. UTF-8 multi-byte sequences have the interesting property of being entirely composed of bytes above 127, over ASCII range. So while Markdown isn't really "aware" of these multi-byte sequences in the sense that it treats them as one character, it isn't changing them into anything either. From your description of the problem, I believe you're not using UTF-8. Michel Fortin michel.fortin@michelf.com http://www.michelf.com/