Thomas Leitner
2010-Sep-03 09:43 UTC
RFC: Lazy syntax for paragraphs, blockquotes and lists
Hi everybody, it was requested that kramdown (a Markdown parser in Ruby, see <http://kramdown.rubyforge.org>) supports the lazy syntax of Markdown. So I sat down, thought about it, skimmed through the Markdown ML on issues regarding lazy indentation as done with Markdown and now I have some rough idea on how to do this in kramdown. First: I'd like to say that there is no way to satisfy everyone. Lazy indentation has some pros and cons and we have to find a middle ground! Second: This is a rather long mail but worth the read, especially if you want to influence how kramdown implements the lazy syntax! Third: I have cross-posted this email to the Markdown ML because it provides a nice explanation of why the behaviour of the lazy syntax in Markdown.pl might be as it is. In the beginning there was... ============================ Markdown was created by John Gruber because he wanted a nice text format that is inspired by how email messages are written. There is a requirement that lines in plain text email message should not be longer than 78 characters and therefore many mail (transport) programs hard-wrap text to a specific line length. I think that this is the reason why we have lazy indentation or generally long line wrapping in Markdown. If we would say that each paragraph must be one long line, there would obviously be problems when messages get automatically wrapped by (email) programs. Therefore Markdown allows paragraphs to continue on the following lines. The Markdown syntax ================== This is just a short summary of how and in which elements Markdown supports lazy indentation (taken more or less from the [Markdown Syntax Page][1]). ## Blockquotes A blockquote starts with a `>` character. All following lines with a `>` character belong to the same blockquote. However, you may be lazy and put the `>` character only before the first line of a blockquote: > This is a normal paragraph in a blockquote. > The blockquote is continued here!!! A blank line between two blockquotes does *not* separate the blockquotes, it's just one large blockquote. ## Lists As with blockquotes, the content of a list item must not be indented correctly. For example: * This is a normal paragraph in a list item. This is even allowed for other paragraphs in the list: * This is a paragraph. This is a paragraph with a lazy indentation. Problems/Ambiguities =================== The lazy indentation syntax provides Markdown users with many chances to get some unexpected output... Additionally, since both lists and blockquotes support lazy indentation it is sometimes not clear what the outcome is when those two elements are combined. Here are some issues taken from the Markdown ML. PA1. First example: * this is list item > * this item is in a block quote more block quoting? PA2. Second example: * > list item with quoting more text here * > list item with quoting more text here * another list item PA3. Third example: > > I wrote something > you replied and now here is my reply to your reply. PA4. Fourth example: > * foo > > bar > > baz The above examples can be interpreted in one way or another. This means that we won't find a solution that satisfies all needs. We can only try to find a solution that is based on a general rule which feels natural to the user and does what most people would expect. Michel Fortin wrote [this][2] on the Markdown ML regarding the lazy syntax:> Basically, I'd eliminate any "half-lazy" syntax were you can be lazy > about list item indentation while not being lazy on blockquote > markers. This just creates confusion; syntax markers shouldn't be > allowed to be lazy. > > Removing half-lazy things would also fix a surprising issue with > blockquotes: > > > foo > > > bar > > baz > > This would be seen as a blockquote containing a "foo" paragraph, a > nested "bar" blockquote and a "baz" paragraph, instead of the > completly counter-intuitive output produced today. To make "baz" > part of the nested blockquote, you would either go the explicit route: > > > foo > > > bar > > > baz > > or the lazy route: > > > foo > > > bar > baz > > but not something in between.kramdown "lazy" syntax ===================== I thought about how I would like things to work, considering all of the above and I came to the following solution. Note, however, that I do *not* recommend using the lazy syntax when writing a document! Since the problem of the lazy syntax arises from the problem of line wrapping, why not just use that to specify how the lazy syntax should work? Before we go into details consider the following: The kramdown syntax page lists the following structural block level elements: * Blank lines * Paragraphs * Headers * Blockquotes * Code blocks * Lists (incl. footnote definitions) * Tables * Horizontal rules * Math blocks * HTML blocks We can leave out all elements which do not inherently support line wrapping, namely blank lines (no text to wrap), code blocks (should be output as is), tables, horizontal rules, math blocks (same as with code blocks) and HTML blocks. Headers can also be left out assuming that a header text is not long enough to trigger line wrapping (this has also been discussed on the Markdown ML and I think that the consensus was that longer header texts should be written directly in HTML). This leaves us with three elements: paragraphs, blockquotes and lists. However, blockquotes and lists are just "wrappers" around paragraphs and therefore the only element that really contains any text in a kramdown (Markdown) document is a paragraph (I also count the compact list text that is not wrapped in `<p>` tags as a paragraph because conceptually it is one). So when we know how long lines in paragraphs are wrapped, the behaviour of long lines in blockquotes and lists are easy to derive. ## Requirements There are two requirements regarding line wrapping and "lazy" syntax: * Line wrapping may be done like it is done by dumb editors, ie. a long line is split on whitespace before the maximal line length and the text continues on the next line (ie. *no* blank line in-between). This means that the additional lines belong to the line (and therefore a certain paragraph) to which line wrapping has been applied! * It must be possible to blockquote a kramdown document (which possible contains lazy lines) and preserve the structure of the quoted document. ## Paragraphs So how to lazy wrap simple paragraphs? This is the easiest one since the [Markdown syntax description][1] already tells us how: just hard-wrap your lines and separate multiple consecutive paragraphs with one or more blank lines. For example: This is one long long long long long long long long long line gets wrapped to: This is one long long long long long long long long long line So the paragraph rule as stated on the [Markdown syntax page][1] is actually needed to support being lazy when writing paragraphs - and to support programs that hard-wrap long lines. ## Blockquotes By following the two requirements as stated above, it is clear how the lazy syntax for blockquotes has to look like. The following examples modify this document: This is one long long long long long long long long long line BQ1. After blockquoting: > This is one long long long long long long long long long line BQ2. After line wrapping and blockquoting: > This is one long long > long long long long > long long long line BQ3. After line wrapping, blockquoting and blockquoting: > > This is one long long > > long long long long > > long long long line BG4. After blockquoting and line wrapping: > This is one long long long long long long long long long line BG5. After blockquoting, line wrapping and blockquoting: > > This is one long long > long long long long > long long long line As can be seen in the last example, the "half-lazy" syntax described by Michel Fortin arises naturally when blockquoting and line wrapping are combined in a certain way. However, I think it should not make any difference whether a document is first line-wrapped and then blockquoted or the other way around. Therefore I would allow this "half-lazy" syntax. What happens if line wrapping is done several times? BG5 with additional line wrapping: > > This is one long long > long long long long > long long long line This looks a bit scary, I admit, but it is still one paragraph embedded in two blockquotes... I don't suggest that anyone writes his documents in this way though... Due to line wrapping we now also have to require the use of blank lines between a blockquote and a following paragraph. Otherwise it is impossible to know whether example BQ4 contains just a blockquote or a blockquote followed by a paragraph. I don't think that requiring a blank line is a burden on writers. If you look through the kramdown or the Markdown ML, you will see that in nearly all emails quoted text is separated from the response by at least one blank line. Note that kramdown would generate two separate blockquotes if they are separated by a blank line (Markdown.pl merges the blockquotes): > This is one blockquote with a long line. > This is another blockquote with a long line. If you run the example BQ1 to BQ5 through Markdown.pl, you will find that it produces the expected output (as defined above). This is no coincidence, I think, since Markdown.pl has been designed with email messages in mind. However, the requirements as stated above haven't been written down anywhere (at least I don't know of it) and with those the behaviour of Markdown.pl is easily explained. ## Lists The content of lists, footnote definitions and all other content (except code blocks) that is defined via indentation, also has to support the lazy syntax. We will start with this document: * This is one long long long long long long long long long line This is one long long long long long long long long long line * Another very very very very very very very very long line LI1. After line wrapping: * This is one long long long long long long long long long line This is one long long long long long long long long long line * Another very very very very very very very very long line So line wrapping inside lists can also be explained in terms of the requirements. And the line wrapping behaviour is identical to that of Markdown.pl. How to interpret the stated problems/ambiguities =============================================== After having specified how the kramdown lazy syntax would work, here is how the initially given problems would be interpreted: PA1. A list with one item, followed by a blockquote containing a list with one item. Markdown.pl interprets it in more or less the same way but using invalid HTML. PA2. A list with three items: the first and the second item contain a blockquote with a paragraph, the third item contains just text. Again, Markdown.pl shows the same behaviour. PA3. Two nested blockquotes containing one paragraph with all the text. Markdown.pl shows the same behaviour. PA4. A blockquote containing a) a list with one item and b) a blockquote with a paragraph containing the text "bar baz". Markdown.pl's behaviour differs - it puts the inner blockquote inside the list item - again we have to disregard the invalid HTML it produces. There is always the problem with blockquote and list markers: if they appear inside a paragraph and line wrapping is applied, they may potentially end up at the beginning of a line... I don't think that this can be avoided. Any other problems/ambiguities/edge cases that need to be addressed? Conclusion ========= The proposed lazy syntax for kramdown is identical to that of the original Markdown implementation - some edge cases are handled differently though. However, in contrast to Markdown.pl more reasons are given why this lazy syntax is useful and how it arises naturally when looking at email messages and how they are processed by MTAs and email programs. I haven't looked at how to implement this in kramdown but it shouldn't be too difficult. Before I do that I would like to hear your opinions on this matter! :-) Best regards and thanks for staying with me through this long email, Thomas [1]: http://daringfireball.net/projects/markdown/syntax [2]: http://osdir.com/ml/text.markdown.general/2007-05/msg00031.html
Michel Fortin
2010-Sep-03 12:13 UTC
RFC: Lazy syntax for paragraphs, blockquotes and lists
Le 2010-09-03 ? 5:43, Thomas Leitner a ?crit :> Note that kramdown would generate two separate blockquotes if they are > separated by a blank line (Markdown.pl merges the blockquotes): > >> This is one blockquote with > a long line. > >> This is another blockquote > with a long line. > > If you run the example BQ1 to BQ5 through Markdown.pl, you will find > that it produces the expected output (as defined above). This is no > coincidence, I think, since Markdown.pl has been designed with email > messages in mind. However, the requirements as stated above > haven't been written down anywhere (at least I don't know of it) and > with those the behaviour of Markdown.pl is easily explained.This last example is quite similar to the example of a lazy blockquote in the Markdown syntax page. It states:> Markdown allows you to be lazy and only put the > before the first line of a hard-wrapped paragraph: > > > This is a blockquote with two paragraphs. Lorem ipsum dolor sit amet, > consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus. > Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus. > > > Donec sit amet nisl. Aliquam semper ipsum sit amet velit. Suspendisse > id sem consectetuer libero luctus adipiscing.<http://daringfireball.net/projects/markdown/syntax#blockquote> Considering the first line of this example in the spec says "this is a blockquote with two paragraphs" and that it follows a non-lazy example where all this text is part of a single blockquote, I wouldn't consider the spec ambiguous. The Markdown syntax description is ambiguous about a lot of things, but not about this one. Most Markdown implementations, but not all, do it as Markdown.pl: <http://babelmark.bobtfish.net/?markdown=%3E++This+is+one+blockquote+with%0D%0A+++a+long+line.%0D%0A%0D%0A%3E++This+is+another+blockquote%0D%0A+++with+a+long+line.%0D%0A&normalize=on&src=1> This isn't a disapproval of how you're planning to do things. I just wanted to make it clear that your last example with two consecutive blockquoted paragraphs is clearly a single blocquote per the Markdown syntax description. -- Michel Fortin michel.fortin at michelf.com http://michelf.com/
david parsons
2010-Sep-03 17:55 UTC
RFC: Lazy syntax for paragraphs, blockquotes and lists
On Fri, Sep 03, 2010 at 11:43:49AM +0200, Thomas Leitner wrote:> In the beginning there was... > ============================> > Markdown was created by John Gruber because he wanted a nice text > format that is inspired by how email messages are written. There is a > requirement that lines in plain text email message should not be longer > than 78 characters and therefore many mail (transport) programs > hard-wrap text to a specific line length.For what it's worth I don't believe that very many (if any) MTAs reformat messages in that way. The Sendmail milter library interface allows filter programs to rewrite messages, but as far as I know the extent of that rewriting is to add, delete, or modify headers, or to do content sanitisation. 78 character line length is an artifact of ttys traditionally being 80 columns wide, as well as an artifact of the annoying habit of some ttys to force a newline if you write a character into the last cell on the screen. -david parsons