Python-markdown currently replaces all straight quotes (`"`) with the html entity (`"`). Someone recently complained about this in a [bug report][]. As it turns out the quote was the symptom that brought the real problem to light[^1]. In any event, the reporter pointed out that markdown.pl does not replace straight quotes with the html entity. I know John Gruber has mentioned before that just because markdown.pl does something, doesn't mean it's right, but the more I think about it the more I think this is right. 1. First of all, using the straight quote is not invalid html or xhtml, so why do we care? In fact, I can't seem to find one reference that explains why the html entity should be used in this case. 2. If I'm not mistaken, having markdown output the html entity makes it imposable for smartypants to then convert straight quotes into "curly" quotes - which may be undesirable to some users. So, my question is: Which implementation do you consider correct and why? [bug report]: https://sourceforge.net/tracker/index.php?func=detail&aid=1862742&group_id=153041&atid=790198 [^1]: The reporter was running Django template syntax through markdown. As the quotes were all converted to entities, he no longer had valid template syntax. Of course, the real problem is that the template syntax should be treated like raw html and python-markdown's extension api makes that easy to do. Otherwise you'd get a tag soup of various blocks of syntax wrapped in unwanted paragraphs and other such undesirable results. Of interest to me is that this is the second bug report we've (python-markdown) received in less than a week because people are running documents with template syntax through markdown (the other reporter was using Mako). I had never considered this use case before (although Yuri apparently has) and me knee-jerk reaction was to not care and suggest that they use raw html in their templates. Is this really a common use case? -- ---- Waylan Limberg waylan at gmail.com
Le 2008-01-04 ? 23:43, Waylan Limberg a ?crit :> 1. First of all, using the straight quote is not invalid html or > xhtml, so why do we care? In fact, I can't seem to find one reference > that explains why the html entity should be used in this case.You need to escape it inside the attributes (such as the title attribute or URLs), but outside attributes that's unnecessary. Perhaps the same escaping function is used for attributes and other text that need escaping. <a href="..." title="My "friend"">...</a>> 2. If I'm not mistaken, having markdown output the html entity makes > it imposable for smartypants to then convert straight quotes into > "curly" quotes - which may be undesirable to some users.That's right. In fact, PHP Markdown use that trick to disable SmartyPants on escaped characters so you don't have to double-escape characters such as the dot or the backtick. For instance: I don't want an elipsis character\... and here I want a double backtick: \`` Pass that though the Markdown & SmartyPants filter on John Gruber's Markdown Dingus and you'll see that you actually need to write: I don't want an elipsis character\\\... and here I want a double backtick: \\\`` to get the expected behaviour.> So, my question is: Which implementation do you consider correct > and why?I'd say both are correct since, within HTML or XML character data, a character entity is semantically equivalent to the litteral character it represents. But for the sake of compatibility with other processing tools, I'd avoid using an entity for strait quotes (or any other character for that matter) when it's not necessary. Michel Fortin michel.fortin at michelf.com http://michelf.com/
I agree with Michel, that it's better to not escape things where it's not necessary. If you want to fix it, the place to do is the method Document.normalizeEntities(), which is now called in two places: 1. in Element.toxml() separately for each attribute 2. in TextNode.toxml() for the whole thing The quickest fix would be to add an extra keyword parameter to this function: something like "skip_quotes" and call with this parameter in the second case. Note that "<", ">", and "&" need to be replaced in both cases. - yuri On Jan 5, 2008 4:51 AM, Michel Fortin <michel.fortin at michelf.com> wrote:> > So, my question is: Which implementation do you consider correct > > and why? > > I'd say both are correct since, within HTML or XML character data, a > character entity is semantically equivalent to the litteral character > it represents. But for the sake of compatibility with other > processing tools, I'd avoid using an entity for strait quotes (or any > other character for that matter) when it's not necessary.-- http://sputnik.freewisdom.org/