thr3ads.net - Markdown Discuss - [ANN] PHP Markdown 1.0.2b7 [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Michel Fortin

2006-Sep-16 17:24 UTC

[ANN] PHP Markdown 1.0.2b7

This is a new release for PHP Markdown, following Markdown.pl 1.0.2b7  
from a few weeks ago. It fix the same bugs, and some more; it also  
introduce more radical backend changes. It can be downloaded here:

<http://www.michelf.com/docs/projets/php-markdown-1.0.2b7.zip>

and you can test it on the PHP Markdown Dingus:

<http://www.michelf.com/projects/php-markdown/dingus/>

This version inaugurates the truly extensible version of PHP Markdown  
which should make it a lot easier to write extensions, like my own  
PHP Markdown Extra. If you want to create your own extension, I  
suggest to take a look at the code for the new PHP Markdown Extra  
I'll release in a few minutes. The most interesting part is probably  
the constructor for the MarkdownExtra_Parser class.

Another big change is the automatic hashing of all Markdown-generated  
HTML content. Previous versions of PHP Markdown Extra were already  
doing this, but it was limited on block-level elements only and was  
done to have less call to make to the expensive html block parser.  
This has been ported to the more basic PHP Markdown, and in addition  
to hashing block-level content it now also hash span-level elements:  
this has the benefit of preventing bad nesting of elements, so  
something like this:

     *Some **strange* emphasis**

will now give valid HTML:

     *Some <strong>strange* emphasis</strong>

It should be noted however that fixing this introduced other nesting  
problems -- like being able to put a [link [inside](#) a link](#).  
These problems will be addressed in a future release.


Original improvements in PHP Markdown 1.0.2b7:

*   Changed span and block gamut methods so that they loop over a
     customizable list of methods. This makes subclassing the parser  
a more
     interesting option for creating syntax extensions.

*   Also added a "document" gamut loop which can be used to hook  
document-level
     methods (like for striping link definitions).

*   Changed all methods which were inserting HTML code so that they  
now return
     a hashed representation of the code. New methods `hashSpan` and  
`hashBlock`
     are used to hash respectivly span- and block-level generated  
content. This
     has a couple of significant effects:
	
     1.  It prevents invalid nesting of Markdown-generated elements  
which
         could occur occuring with constructs like `*something [link*] 
[1]`.
	2.  It prevents problems occuring with deeply nested lists on which
         paragraphs were ill-formed.
	3.  It removes the need to call `hashHTMLBlocks` twice during the the
         block gamut.
	
     Hashes are turned back to HTML prior output.

*   Made the block-level HTML parser smarter using a specially- 
crafted regular
     expression capable of handling nested tags.

*	Solved backtick issues in tag attributes by rewriting the HTML  
tokenizer to
	be aware of code spans. All these lines should work correctly now:
	
		<span attr='`ticks`'>bar</span>
		<span attr='``double ticks``'>bar</span>
		`<test a="` content of attribute `">`

*   Changed the parsing of HTML comments to match simply from `<!--`  
to `-->`
     instead using of the more complicated SGML-style rule with  
paired `--`.
     This is how most browsers parse comments and how XML defines  
them too.

*   `<address>` has been added to the list of block-level elements  
and is no
     being incorrectly wrapped within paragraph tags.


Improvements borrowed from Markdown.pl:

*   Now only trim trailing newlines from code blocks, instead of  
trimming
     all trailing whitespace characters.

*   Fixed bug where this:

         [text](http://m.com "title" )
		
     wasn't working as expected, because the parser wasn't allowing  
for spaces
     before the closing paren.

*   Filthy hack to support markdown='1' in div tags.

*   _DoAutoLinks() now supports the 'dict://' URL scheme.

*   PHP- and ASP-style processor instructions are now protected as
     raw HTML blocks.

		<? ... ?>
		<% ... %>

*	Experimental support for [this] as a synonym for [this][].

*   Fix for escaped backticks still triggering code spans:

         There are two raw backticks here: \` and here: \`, not a  
code span


Michel Fortin
michel.fortin@michelf.com
http://www.michelf.com/

John Gruber

2006-Sep-17 20:14 UTC

head link

[ANN] PHP Markdown 1.0.2b7

Michel Fortin <michel.fortin@michelf.com> wrote on 9/16/06 at 5:23 PM:
> Another big change is the automatic hashing of all Markdown-generated  
> HTML content. Previous versions of PHP Markdown Extra were already  
> doing this, but it was limited on block-level elements only and was  
> done to have less call to make to the expensive html block parser.  
> This has been ported to the more basic PHP Markdown, and in addition  
> to hashing block-level content it now also hash span-level elements:  
> this has the benefit of preventing bad nesting of elements, so  
> something like this:
> 
>      *Some **strange* emphasis**
> 
> will now give valid HTML:
> 
>      *Some <strong>strange* emphasis</strong>
That's interesting, and because the output is valid, it's probably
better than what Markdown.pl currently generates.

I've been thinking that a better solution for input like that
would be to generate markup like this:

    <em>Some <strong>strange</strong></em><strong>
emphasis</strong>

Which is more of a "do what I mean" solution. However, I've given
no thought whatsoever to how this would be done algorithmically.

> *   Made the block-level HTML parser smarter using a
> specially- crafted regular expression capable of handling
> nested tags.
A single pattern that matches nested tags?!

$me == "downloading now";

-J.G.

John Gruber

2006-Sep-17 20:14 UTC

head link

Markdown comments

Michel Fortin <michel.fortin@michelf.com> wrote on 9/16/06 at 5:23 PM:
> *   Changed the parsing of HTML comments to match simply from
> `<!--` to `-->` instead using of the more complicated
> SGML-style rule with paired `--`. This is how most browsers
> parse comments and how XML defines them too.
Interesting. I had no idea that SGML comment rules were being
officially or semi-officially abandoned for HTML parsers. I
certainly welcome this change.

This page, and the included tests, seems like a good resource:

  <http://www.howtocreate.co.uk/SGMLComments.html>

Test 4 is interesting:

  <http://www.howtocreate.co.uk/sgml/test4.html>

The test is of this:

    <!-- -- -->bar<!-- -- -->

SGML-compliant browsers will treat that not as two comments with
"bar" in middle, but instead as a comment tag containing three
different comments:

    -- --
    -->bar<!--
    -- --

Safari 2.0.4 (419.3) treats it the simple (obvious) way, as two
comments with "bar" in the middle.

OmniWeb 5.5 treats it the SGML way. (I hate the way OmniWeb
overrides WebKit in so many edge cases, but that's another story.)

Firefox 1.5.0.6 and Camino 1.0.1 treat it the SGML way.

I also downloaded a nightly build of Firefox, and it still treats
it the SGML way.

Opera 9.0 treats it the simple, obvious way.

* * *

I'd like to make this change to Markdown, because I genuinely
believe HTML comments *should* follow the simple, obvious rule,
not the SGML rules.

But I'd feel better about making this change in Markdown if Gecko
were on board.

-J.G.

Allan Odgaard

2006-Sep-17 20:17 UTC

head link

Markdown comments

On 18/9/2006, at 1:30, John Gruber gruber-at-fedora.net |Markdown|  
wrote:
> [...] Interesting. I had no idea that SGML comment rules were being
> officially or semi-officially abandoned for HTML parsers. I
> certainly welcome this change.
The HTML specification sort of defines comments [1]:

    > [...] A common error is to include a string of hyphens
    > ("---") within a comment. Authors should avoid putting two
    > or more adjacent hyphens inside comments.

So it is an error, and authors *should* avoid it, which I read as,  
don?t do it because it gives problems, but it?s not really illegal.

Of course another place [2] in the specification they write:

    > Please consult the SGML standard for information about rules
    > governing elements (e.g., they must be properly nested, an
    > end tag closes, back to the matching start tag, all unclosed
    > intervening start tags with omitted end tags (section
    > 7.5.1), etc.).

I.e. HTML is an SGML application, and thus normal SGML rules apply,  
hinting that the bit about comments is really just trying to say that  
because of de facto standards, one should not expect full SGML  
comments to be supported.

I find the above (quoted) paragraph a bit ironic, considering that I  
have not found a browser which fully adheres to the rule they quote  
(about an end-tag closing back to the matching start tag).

[1] http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4
[2] http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.1

Magnus Lie Hetland

2006-Sep-19 03:06 UTC

head link

Markdown comments

On Sep 19, 2006, at 0:01, A. Pagaltzis wrote:
> * Michel Fortin <michel.fortin@michelf.com> [2006-09-18 03:55]:
>> Maybe Markdown should do something about double-hyphens inside
>> comments. This way, a user could comment out any piece of
>> Markdown  text -- like this paragraph -- without having to
>> worry about the  double hyphens it contains.
>
> FWIW, HTML Tidy converts hyphen sequences in comments into equals
> sign sequences.
How about entity-encoding the hyphens? Or will that not help wrt. the  
XML syntax? (The semantics of the text would at least remain the  
same... No?)

-- 
Magnus Lie Hetland
http://hetland.org

A. Pagaltzis

2006-Sep-19 20:32 UTC

head link

Markdown comments

* Magnus Lie Hetland <magnus@hetland.org> [2006-09-19
09:10]:> How about entity-encoding the hyphens? Or will that not help
> wrt. the XML syntax? (The semantics of the text would at least
> remain the same... No?)
It would make the comment legal, but it would not preserve
semantics, no. Comment text is always verbatim; no entity (or
other) parsing is performed on the text between comment markers.

I don?t see why anyone would care about lossy munging, though.
Does anyone roundtrip Markdown-generated HTML documents back to
Markdown and if so, do they really need to have their hyphens
preserved exactly? It seems to me that simply subtituting a
similar-enough-looking character is Good Enough.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Seemingly Similar Threads

Search for more possibly parallel threads

Markdown Discuss - Sep 2006 - PHP Markdown 1.0.2b7

[ANN] PHP Markdown 1.0.2b7

[ANN] PHP Markdown 1.0.2b7

Markdown comments

Markdown comments

Markdown comments

Markdown comments

Seemingly Similar Threads