We recently received the following bug report for the python-markdown implementation:> The "&" are escaped in URLs. > > An example: > [Link](http://www.site.com/?param1=value1¶m2=value1) > > Should output: > <a href="http://www.site.com/?param1=value1¶m2=value1">Link</a> > > Currently outputs: > <a href="http://www.site.com/?param1=value1&param2=value1">Link</a> > > So the "&" must not be escaped!A fix is easy, but it occurred to me that perhaps links should be urlencoded -- at least some chars should be. Specifically the "unsafe" chars listed in RFC 1738 [1]. The "reserved" chars probably should too when not used in their approved manner (i.e.: A colon should only be allowed after the scheme (http://) or in the location (usr:pass at host:port) but should be encoded anywhere else). Of course, that involves extra work. So I went to check what other implementations do [2] and discovered that every one escapes with html entities. Is there something I'm missing or is this a bug? As far as I can tell, the "&" breaks the query string. [1]: http://www.rfc-editor.org/rfc/rfc1738.txt [2]: http://babelmark.bobtfish.net/?markdown=%5BLink%5D%28http%3A%2F%2Fwww.site.com%2F%3Fparam1%3Dvalue1%26param2%3Dvalue1%29&normalize=on&src=1&dest=2 -- ---- Waylan Limberg waylan at gmail.com
Am Dienstag, 1. April 2008 schrieb Waylan Limberg:> We recently received the following bug report for the python-markdown > > implementation: > > The "&" are escaped in URLs. > > > > An example: > > [Link](http://www.site.com/?param1=value1¶m2=value1) > > > > Should output: > > <a href="http://www.site.com/?param1=value1¶m2=value1">Link</a>No it shouldn't. This is invalid (x)HTML.> > Currently outputs: > > <a > > href="http://www.site.com/?param1=value1&param2=value1">Link</a>Which is valid (x)HTML!> > So the "&" must not be escaped!It must! See also <http://htmlhelp.com/tools/validator/problems.html#amp> -- Milian Wolff http://milianw.de OpenPGP key: CD1D1393 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : <http://six.pairlist.net/pipermail/markdown-discuss/attachments/20080401/8820cc04/attachment.pgp>
Aristotle Pagaltzis
2008-Apr-01 04:45 UTC
On ampersands in query strings (was: HTML entities in URLs and urlencoding)
* Waylan Limberg <waylan at gmail.com> [2008-04-01 03:50]:> As far as I can tell, the "&" breaks the query string.No, it doesn?t, as you found out. However, on a tangential note: if you write web apps, *please* make sure that you support the semicolon as a query parameter separator as well as the ampersand: http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2 More importantly, please **please** make sure that the URIs your code generates use semicolons rather than ampersands. Semicolons need not be escaped in HTML and XML, which makes copy-pasting users much less likely to produce invalid markup regardless of the context they?re working in. Even though this W3C recommendation is over a decade old, use of ampersands in query strings persists. (In fact, PHP not only does not emit URIs with semicolon-separated query strings, by default it cannot even parse them! You need to set an unbreak-me config option to make it recognise the semicolon.) Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>
The escaped ampersand doesn't break any of the URLs that I use regularly...in fact when I write straight xhtml I encode them in my URLs so that my pages still pass the xhtml 1.0 strict. On Mar 31, 2008, at 9:45 PM, Waylan Limberg wrote:> Is there something I'm missing or is this a bug? As far as I > can tell, the "&" breaks the query string./david -- david herren - shoreham, vt us na terra solsys orionarm "I didn't attend the funeral, but I sent a nice letter saying I approved of it." --- Mark Twain