thr3ads.net - Markdown Discuss - Data loss issue: Adjacent List Types [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Alan Hogan

2011-Jun-06 17:20 UTC

Data loss issue: Adjacent List Types

Esteemed human authors and robotic parse-bots:

I recently discovered that most or all Markdown implementations, including
Gruber?s original in Perl, have an odd behavior with regards to lists that
follow each other. Namely, a bulleted list followed by a numbered list, or
vice-versa, is masked as if it were part of the first list (and of the first
list?s type.)

For example, consider the following input:

~~~~~~

- Bulleted item
- Second bulleted item

1. Numbered list
2. Second numbered item

~~~~~~

It will yield an output of one UL element with four LI children (themselves
containing some number of P tags, varying by implementation).

Now, I realize full well that a blank line between list items causes the list
items to be given <p> tags. But the blank line above, to any reasonable
*human*, isn?t separating list items but rather *lists.*

There is a fundamental problem in the above code: that it triggers **non-obvious
data loss.**

The data is of course the numbering.

The non-obviousness is due to the way the output formatting is essentially
correct, and only the list item markers are unexpected. A cursory scan of the
Markdown-transformed text ? e.g., looking over a blog post before publishing ?
will show no structural problems. Success, publish! ? How long until the author
realizes his/her reference to ?step #2? is actually referring to the fourth
bullet in an awkward list?

One of the nicest things about Markdown is that once you get it, and it doesn?t
take long, then there is precious little by way of surprises it will throw at
you.

If for no other reason, I think the counter-intuitiveness and ?crap do I really
have to remember that you can?t follow a list by a list? moment are in and of
themselves reasons to change the behavior. Besides the data loss.

I also struggle to imagine anyone who would be upset at the change. After all,
what end-user would *rely* on this feature to munge their list types?

Alan

Lasar Liepins

2011-Jun-06 17:57 UTC

head link

Data loss issue: Adjacent List Types

Hello,

while I agree that this is technically an issue, I don't think it
is an often seen issue in actual human-written text. Markdown is
plain text formatted by and for humans. I don't think there are
many cases where you would want to put two lists after each other
without an introduction of sorts.

And on a side note: Gruber notes in the markdown spec that the
actual numbers used in a numbered list are ignored. So data loss
is already occuring here.

Greetings,

_Lasar

On 2011-06-06, at 19:20, Alan Hogan wrote:
> Esteemed human authors and robotic parse-bots:
> 
> I recently discovered that most or all Markdown implementations, including
Gruber?s original in Perl, have an odd behavior with regards to lists that
follow each other. Namely, a bulleted list followed by a numbered list, or
vice-versa, is masked as if it were part of the first list (and of the first
list?s type.)
> 
> For example, consider the following input:
> 
> ~~~~~~
> 
> - Bulleted item
> - Second bulleted item
> 
> 1. Numbered list
> 2. Second numbered item
> 
> ~~~~~~
> 
> It will yield an output of one UL element with four LI children (themselves
containing some number of P tags, varying by implementation).
> 
> Now, I realize full well that a blank line between list items causes the
list items to be given <p> tags. But the blank line above, to any
reasonable *human*, isn?t separating list items but rather *lists.*
> 
> There is a fundamental problem in the above code: that it triggers
**non-obvious data loss.**
> 
> The data is of course the numbering.
> 
> The non-obviousness is due to the way the output formatting is essentially
correct, and only the list item markers are unexpected. A cursory scan of the
Markdown-transformed text ? e.g., looking over a blog post before publishing ?
will show no structural problems. Success, publish! ? How long until the author
realizes his/her reference to ?step #2? is actually referring to the fourth
bullet in an awkward list?
> 
> One of the nicest things about Markdown is that once you get it, and it
doesn?t take long, then there is precious little by way of surprises it will
throw at you.
> 
> If for no other reason, I think the counter-intuitiveness and ?crap do I
really have to remember that you can?t follow a list by a list? moment are in
and of themselves reasons to change the behavior. Besides the data loss.
> 
> I also struggle to imagine anyone who would be upset at the change. After
all, what end-user would *rely* on this feature to munge their list types?
> 
> Alan
> 
> _______________________________________________
> Markdown-Discuss mailing list
> Markdown-Discuss at six.pairlist.net
> http://six.pairlist.net/mailman/listinfo/markdown-discuss


-- 
_Lasar Liepins
lasar at liepins.net
http://liepins.net/
http://10110101.net/

David Chambers

2011-Jun-06 18:07 UTC

head link

Data loss issue: Adjacent List Types

I agree with Lasar that such cases arise infrequently. I do support such a
change in theory, though, but I'm not sure how difficult this would be to
implement given the fact that double line breaks can be used to have list
items wrapped in `p` tags.

Alan Hogan <contact at alanhogan.com> wrote:

After all, what end-user would *rely* on this feature to munge their
list> types?

Good point.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://six.pairlist.net/pipermail/markdown-discuss/attachments/20110606/461bb807/attachment-0001.html>

Alan Hogan

2011-Jun-06 18:26 UTC

head link

Data loss issue: Adjacent List Types

Quoth _Lasar: 
> while I agree that this is technically an issue, I don't think it
> is an often seen issue in actual human-written text. Markdown is
> plain text formatted by and for humans. I don't think there are
> many cases where you would want to put two lists after each other
> without an introduction of sorts.


I must of course agree that it is not an exceedingly common case, or a terribly
sensible decision to make.

That said:

Consider a student quickly taking notes, or a liveblogger publishing quickly.
They may not have time to write an intro for each list, or realize that they
skipped it?
I personally have experienced this issue, so it does happen.
Even if a small fraction of users run into this issue ? half a percent, say ? if
I am providing a service to two hundred thousand of users (and I do), that?s a
thousand people affected.



> And on a side note: Gruber notes in the markdown spec that the
> actual numbers used in a numbered list are ignored. So data loss
> is already occuring here.

Now that is true.

However:

Existing data loss doesn?t mean we should be okay with more data loss.
The numbers couldn?t really be always matched in output given how HTML works,
anyway?
I personally made a mistake by starting a paragraph with ?1999.? today, so this
too can cause problems. (At least it?s part of the Markdown spec though.)
I am personally disappointed that the `start` attribute (?) isn?t used, based
off the first number in the list; this would also help catch mistakes.



Given that I still struggle to see a downside to making my proposed change, I?m
really hoping we can achieve a rough consensus here.





-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://six.pairlist.net/pipermail/markdown-discuss/attachments/20110606/bfcfd701/attachment.htm>

John MacFarlane

2011-Jun-06 20:23 UTC

head link

Data loss issue: Adjacent List Types

+++ Alan Hogan [Jun 06 11 10:20 ]:> Esteemed human authors and robotic parse-bots:
> 
> I recently discovered that most or all Markdown implementations, including
Gruber?s original in Perl, have an odd behavior with regards to lists that
follow each other. Namely, a bulleted list followed by a numbered list, or
vice-versa, is masked as if it were part of the first list (and of the first
list?s type.)
> 
> For example, consider the following input:
> 
> ~~~~~~
> 
> - Bulleted item
> - Second bulleted item
> 
> 1. Numbered list
> 2. Second numbered item
> 
> ~~~~~~
I strongly agree that this should be parsed as an unordered list
followed by an ordered list.  That is how any normal person would
construe it.

I also think that the following should be interpreted as two different
unordered lists (that is, the change of bullet character should be
significant):

~~~~~~
* one
* two

- new
- list
~~~~~~

Finally, I think that the starting number of an ordered list should be
significant. Otherwise there is no way to have a running list with
commentary in between (but not part of) the items.

(By the way, pandoc implements this last feature, and the next major
version will implement the first two.)

John

David Chambers

2011-Jun-06 20:34 UTC

head link

Data loss issue: Adjacent List Types

John MacFarlane <jgm at berkeley.edu> wrote:

I also think that the following should be interpreted as two
different> unordered lists (that is, the change of bullet character should be
> significant):
>
> ~~~~~~
> * one
> * two
>
> - new
> - list
> ~~~~~~

+1 to this, too.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://six.pairlist.net/pipermail/markdown-discuss/attachments/20110606/1a09c1d5/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

Markdown Discuss - Jun 2011 - Data loss issue: Adjacent List Types

Data loss issue: Adjacent List Types

Data loss issue: Adjacent List Types

Data loss issue: Adjacent List Types

Data loss issue: Adjacent List Types

Data loss issue: Adjacent List Types

Data loss issue: Adjacent List Types

Apparently Analagous Threads