Hello everybody, I've just noticed that markdown doesn't always generate XHTML. In particular the input <script src="http://evilserver.net/evil.js"> generates the output: <p><script src="http://evilserver.net/evil.js"></p> (This is the markdown dingus at daring fireball, and the markdownj implementation exhibits the same problem. I havn't checked other implementations of markdown.) I have two issues with this: 1. The script tag isn't closed, which means it's not valid XML (and thus not valid XHTML). 2. It's a security issue if you allow visitors to enter markdown text and display it on a page, e.g., in a forum, as it allows certain HTML injection attacks. I've looked at the mailing list archives without finding any note that this is a known issue. Would you consider this a bug or a feature? If it's a feature, then unfortunately I won't be able to use markdown for a forum I'm administrating due to the security implications. Cheers, -- Ulf
Am Freitag, 14. M?rz 2008 schrieb Ulf Ochsenfahrt:> Hello everybody, > > I've just noticed that markdown doesn't always generate XHTML. In > particular the input > > <script src="http://evilserver.net/evil.js"> > > generates the output: > > <p><script src="http://evilserver.net/evil.js"></p> > > (This is the markdown dingus at daring fireball, and the markdownj > implementation exhibits the same problem. I havn't checked other > implementations of markdown.) > > I have two issues with this: > 1. The script tag isn't closed, which means it's not valid XML (and thus > not valid XHTML).This is a bug in my eyes.> 2. It's a security issue if you allow visitors to enter markdown text > and display it on a page, e.g., in a forum, as it allows certain HTML > injection attacks. > > > I've looked at the mailing list archives without finding any note that > this is a known issue. > > Would you consider this a bug or a feature? If it's a feature, then > unfortunately I won't be able to use markdown for a forum I'm > administrating due to the security implications.The security issue is not markdowns. You'll have to supply your own validation and input filtering mechanisms. A *good* editor could want to include `<script>` tags and it's not Markdowns philosophy to stand in the way here. There are tons of pretty decent filtering functions out there. Which programming language do you use? -- Milian Wolff http://milianw.de OpenPGP key: CD1D1393 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20080314/5bdbc...>
On Fri, Mar 14, 2008 at 12:11 PM, Milian Wolff <mail at milianw.de> wrote:> > > > I have two issues with this: > > 1. The script tag isn't closed, which means it's not valid XML (and thus > > not valid XHTML). > > This is a bug in my eyes.Is it markdown's business to correct bad markup input (which, I understand, it ignores)?
On Fri, Mar 14, 2008 at 3:20 PM, Joseph Lorenzo Hall <joehall at gmail.com> wrote:> On Fri, Mar 14, 2008 at 12:11 PM, Milian Wolff <mail at milianw.de> wrote: > > > > > > I have two issues with this: > > > 1. The script tag isn't closed, which means it's not valid XML (and thus > > > not valid XHTML). > > > > This is a bug in my eyes. > > Is it markdown's business to correct bad markup input (which, I > understand, it ignores)? >Right, raw html (or anything which looks like html - pretty much anything between < and >) is simply passed through unchanged. If the input is not valid, then the output will not be either. That is known and expected behavior. I'd say a feature! Definitely NOT a bug. Now, if you want to put your own mechanisms in place to address that, either before or after markdown is run, you are more than welcome to do so. Regarding the security issues, I understand your concerns, but there are some situations were all document authors are trusted (authenticated) users and have a legitimate need for that feature. We can't cut them off for everyone else. However, I know that Python-Markdown has an option to not allow any html in a document (this "safe_mode" can be set to either replace with a customizable message, remove completely, or escape the html). Of course, to stay in line with the Markdown standard, it is off by default, but very easy to turn on in your code. Other implementations may offer a similar option. -- ---- Waylan Limberg waylan at gmail.com
On Mar 14, 2008, at 9:38 PM, Waylan Limberg wrote:> I know that > Python-Markdown has an option to not allow any html in a document > (this "safe_mode" can be set to either replace with a customizable > message, remove completely, or escape the html). Of course, to stay in > line with the Markdown standard, it is off by default, but very easy > to turn on in your code. Other implementations may offer a similar > option.Or one could preprocess the text directly before rendering it, e.g.: aText = aText:gsub( '(`?)(<.->)(`?)', '`%2`' ) aText = markdown( aText ) aText = aText:gsub( '(`)(<.->)(`)', '%2' ) http://dev.alt.textdrive.com/browser/HTTP/WikiService.lua#L281 That way anything between angle brackets is escaped. Or at least this is what Nanoki, a wiki engine implemented in Lua, does to protect the innocent from shooting themselves in the foot :) http://svr225.stepx.com:3388/nanoki Try to edit the online demo: http://svr225.stepx.com:3388/test In theory, functional anomalies aside, Nanoki's pages should always render as valid XHTML. -- PA http://alt.textdrive.com/nanoki/
Preprocessing markdown is something extremely difficult to get right... Le 2008-03-14 ? 16:57, Petite Abeille a ?crit :> Or one could preprocess the text directly before rendering it, e.g.: > > aText = aText:gsub( '(`?)(<.->)(`?)', '`%2`' ) > aText = markdown( aText ) > aText = aText:gsub( '(`)(<.->)(`)', '%2' )That looks rather hackish and trivial to work around if you want to inject random HTML. It may protect the user from accidentally inserting HTML, but it will only detract for a couple of seconds someone voluntarily seeking to do it. What would it do with this for instance? ``<script <!-- alert("Hello world!") </script <>``` If I understand the preprocessing, the example string would be unchanged after preprocessing. When passed through PHP Markdown and Markdown.pl (tried on the Dingus), this will pop an alert.> Or at least this is what Nanoki, a wiki engine implemented in Lua, > does to protect the innocent from shooting themselves in the foot :) > > http://svr225.stepx.com:3388/nanoki > > Try to edit the online demo: > > http://svr225.stepx.com:3388/testAs expected, the example above pops an alert. I'm not sure why, but even this one works: <script <!-- alert("Hello world!") </script <>> In theory, functional anomalies aside, Nanoki's pages should always > render as valid XHTML.Practice always defies theory once the theory is put in practice. Michel Fortin michel.fortin at michelf.com http://michelf.com/
> However, I know that > Python-Markdown has an option to not allow any html in a document > (this "safe_mode" can be set to either replace with a customizable > message, remove completely, or escape the html). Of course, to stay in > line with the Markdown standard, it is off by default, but very easy > to turn on in your code. Other implementations may offer a similar > option.Pandoc has the option --sanitize-html sanitizes HTML (in markdown or HTML input) using a whitelist. Unsafe tags are replaced by HTML comments; unsafe attributes are omitted. John
Le 2008-03-14 ? 16:38, Waylan Limberg a ?crit :> Regarding the security issues, I understand your concerns, but there > are some situations were all document authors are trusted > (authenticated) users and have a legitimate need for that feature. We > can't cut them off for everyone else. However, I know that > Python-Markdown has an option to not allow any html in a document > (this "safe_mode" can be set to either replace with a customizable > message, remove completely, or escape the html). Of course, to stay in > line with the Markdown standard, it is off by default, but very easy > to turn on in your code. Other implementations may offer a similar > option."Safe mode" you say? PHP Markdown also has a no-markup mode which would filter script tags and any other HTML tags. But this doesn't prevent anyone from inserting their own script on the page. Do you know you can inject a script in a URL? Guess what this does: [link](javascript:alert%28'Hello%20world!'%29) There is also a browser (IE I think) which automatically execute javascripts used as the source URL for an image, so this could also work in some cases:  Michel Fortin michel.fortin at michelf.com http://michelf.com/
Waylan Limberg wrote:> Regarding the security issues, I understand your concerns, but there > are some situations were all document authors are trusted > (authenticated) users and have a legitimate need for that feature. We > can't cut them off for everyone else. However, I know that > Python-Markdown has an option to not allow any html in a document > (this "safe_mode" can be set to either replace with a customizable > message, remove completely, or escape the html). Of course, to stay in > line with the Markdown standard, it is off by default, but very easy > to turn on in your code. Other implementations may offer a similar > option.Yes, there are situations where all document authors are trusted (authentication isn't trust though), but the fact remains that this makes markdown completely unusable for anything else. And worse, people are not made aware of this fact. I only encountered this by coincidence, because one of my users entered what looked like html tags into the forum. In summary: Markdown wasn't designed to handle this situation. Some implementations provide a 'safe mode' which aims to filter the code either before or after markdown conversion. Markdownj (Java, which I've been using) doesn't provide such an option. Markdown.pl doesn't provide such an option. Nanoki tries to, and fails (see related mail by Michel Fortin) on: <script <!-- alert("Hello world!") </script <> PHP Markdown has something like this, and it has to be enabled in the source (?). It fails when no_markup=true and no_entities=false on: <script>alert('hallo');</script> Python markdown has such an option and it appears to work for simple tests. Looking at the code, python markdown apparently creates an XML document tree and serializes it, making sure that the generated code is always valid XML (that's a very good design choice if I may say so). I havn't tried Pandoc, which was also mentioned by John MacFarlane. Ok, thanks everyone for their input. It remains the potential issue of some browsers automatically executing javascript when used as a source URL for an image, and possibly also the use of javascript urls in links. Other than that I think I'll move forward using the python markdown implementation via jython or start porting it to java. Again, thanks everyone for their quick replies. Cheers, -- Ulf
On Sat, Mar 15, 2008 at 3:49 AM, Ulf Ochsenfahrt <ulf at ofahrt.de> wrote:> Waylan Limberg wrote: > > Regarding the security issues, I understand your concerns, but there > > are some situations were all document authors are trusted > > (authenticated) users and have a legitimate need for that feature. We > > can't cut them off for everyone else. However, I know that > > Python-Markdown has an option to not allow any html in a document > > (this "safe_mode" can be set to either replace with a customizable > > message, remove completely, or escape the html). Of course, to stay in > > line with the Markdown standard, it is off by default, but very easy > > to turn on in your code. Other implementations may offer a similar > > option. > > Yes, there are situations where all document authors are trusted > (authentication isn't trust though), but the fact remains that this > makes markdown completely unusable for anything else. And worse, people > are not made aware of this fact. I only encountered this by coincidence, > because one of my users entered what looked like html tags into the forum. > > In summary: > Markdown wasn't designed to handle this situation. Some implementations > provide a 'safe mode' which aims to filter the code either before or > after markdown conversion. > > > Markdownj (Java, which I've been using) doesn't provide such an option. > > Markdown.pl doesn't provide such an option. > > Nanoki tries to, and fails (see related mail by Michel Fortin) on: > > <script <!-- > alert("Hello world!") > </script <> > > PHP Markdown has something like this, and it has to be enabled in the > source (?). It fails when no_markup=true and no_entities=false on: > <script>alert('hallo');</script> > > Python markdown has such an option and it appears to work for simple > tests. Looking at the code, python markdown apparently creates an XML > document tree and serializes it, making sure that the generated code is > always valid XML (that's a very good design choice if I may say so). > > I havn't tried Pandoc, which was also mentioned by John MacFarlane. > >Thanks for the summary. For completeness, Maruku's output is: <pre class='markdown-html-error'>HTML parse error: <script <!-- alert("Hello world!") </script <></pre> You see, Maruku is used inside Jacques Distler's math-enabled branch of Instiki [1] which outputs well-formed XHTML + MathML + SVG. You can't really leave anything to chance. If there is only one error somewhere, the document does not validate and therefore it does not render. Maruku's treatment of raw XML is that it requires it to be well-formed XML, with some user-friendly exceptions inspired by HTML (user doesn't have to close <br>,<img>, etc.). If it isn't well formed, it triggers an error (nicely displayed on stderr, or intercepted by API). Parsing goes on, but for convenience it outputs the error in the document (see above; can be hidden by CSS). Jacques also did some work on sanitizing the XHTML document, but this logically happens after Maruku. [1]: http://golem.ph.utexas.edu/instiki/show/HomePage -- Andrea Censi PhD student, Control & Dynamical Systems, Caltech http://www.cds.caltech.edu/~andrea/ "Life is too important to be taken seriously" (Oscar Wilde)
Ulf Ochsenfahrt wrote:> Yes, there are situations where all document authors are trusted > (authentication isn't trust though), but the fact remains that this > makes markdown completely unusable for anything else.Ulf, No, it doesn't. All it does is make Markdown *alone* inappropriate for content generated by untrusted users. But that shouldn't be surprising. Markdown is designed to work as a preprocessor, not as an alternative to HTML or as a sanitizer. If you need an HTML sanitizer, there are lots of them available, and there should be nothing stopping you from using Markdown in order to generate the HTML and then an appropriate second tool to sanitize it: $body = Markdown($source); $body = WhitelistBasedFilter($body); In fact that's precisely what a lot of Markdown consumers (e.g. WordPress with PHP Markdown turned on for comments) do. > And worse, people are not made aware of this fact. Made aware of what? John Gruber's documentation is certainly quite explicit that Markdown allows for raw HTML; that's part of the point of Markdown, as opposed to other plaintext syntaxes that try to replace HTML entirely. If you expect it to be something it's not (e.g. a validating producer or a sanitizer) then you'll no doubt be disappointed, but I don't think it's fair to claim that Markdown implementers are the ones leading you to expect some other kind of behavior than what you get. -C
Rad Geek wrote:> Ulf Ochsenfahrt wrote: > > And worse, people are not made aware of this fact. > > Made aware of what? John Gruber's documentation is certainly quite > explicit that Markdown allows for raw HTML; that's part of the point of > Markdown, as opposed to other plaintext syntaxes that try to replace > HTML entirely. If you expect it to be something it's not (e.g. a > validating producer or a sanitizer) then you'll no doubt be > disappointed, but I don't think it's fair to claim that Markdown > implementers are the ones leading you to expect some other kind of > behavior than what you get.Apparently, I attribute a different meaning to the word 'explicit'. First of all, the Main page on daring fireball says: > Markdown is a text-to-HTML conversion tool for web writers. Markdown > allows you to write using an easy-to-read, easy-to-write plain text > format, then convert it to structurally valid *XHTML* (or HTML). (Emphasis mine.) That appears to be quite a strong statement if you ask me. On the Syntax page, it says that it allows inline HTML. But it does not says that this is potentially dangerous. Ture, it doesn't say that the inlined HTML is sanity checked, either. However, it does list HTML tag names, and it even goes so far as saying that markdown does process text inside span-level tags, so it must be aware of them to some extend at least. I guess I should have researched more thoroughly before I started using markdown for that forum. But I politely disagree with you when you say that 'John Gruber's documentation is quite explicit' that markdown is dangerous (the word you chose was 'inappropriate') when used in this context. Cheers, -- Ulf
Am Freitag, 14. M?rz 2008 schrieb Joseph Lorenzo Hall:> On Fri, Mar 14, 2008 at 12:11 PM, Milian Wolff <mail at milianw.de> wrote: > > > I have two issues with this: > > > 1. The script tag isn't closed, which means it's not valid XML (and > > > thus not valid XHTML). > > > > This is a bug in my eyes. > > Is it markdown's business to correct bad markup input (which, I > understand, it ignores)?Yes of course you are right. Only the HTML generated by Markdown should by valid XHTML - if you input invalid stuff it's your own problem. Where was I... -- Milian Wolff http://milianw.de OpenPGP key: CD1D1393 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20080314/c0117...>