thr3ads.net - Markdown Discuss - Benchmarks with TextMate's manual [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Michel Fortin

2007-Aug-27 20:42 UTC

Benchmarks with TextMate's manual

The following benchmarks have been obtained using the TextMate manual  
as the input source:

  <http://macromates.com/textmate/manual/source.tbz>

Using PHP Markdown, parsing the 24 files separately (with the  
reference file appended to each of them), I get this (on an iBook G4  
1.2 Ghz):

                        Total   Avg.   Min.    Q1.   Med.    Q3.   Max.
     Parse Time (ms):    2616    109     13     64     89    125    433
     Diff. Min. (ms):    2292     95      0     51     75    112    419

Doing the same with Markdown.pl 1.0.2b8:

                        Total   Avg.   Min.    Q1.   Med.    Q3.   Max.
     Parse Time (ms):   11912    496    206    241    273    387   2064
     Diff. Min. (ms):    6966    290      0     35     67    181   1858

Of interest is the same thing with Markdown.pl 1.0.1:

                        Total   Avg.   Min.    Q1.   Med.    Q3.   Max.
     Parse Time (ms):    5883    245    148    168    194    220    957
     Diff. Min. (ms):    2310     96      0     19     46     71    808

The older version takes half the time. I think we're seeing here the  
drop in performance from Markdown.pl 1.0.2's new HTML block parser.  
Note that the way Markdown.pl is built, the HTML block parser is  
called for each Markdown-generated code, which means that you don't  
need to have HTML blocks in the source to experience a noticeable  
drop of performance when the HTML block parser gets slower. PHP  
Markdown used to work the same, but this changed at version 1.0.1d.

Now, the interesting part of the test: combining all the documents  
together and parsing them in one shot (352 Ko). With PHP Markdown it  
takes 29 seconds; with Markdown.pl 1.0.1 it takes 71 seconds. Beside  
the obvious speed difference between PHP Markdown and Markdown.pl  
(probably due to what I mentioned above), this test shows that  
neither PHP Markdown or Markdown.pl scale well for big documents.

I'll let you draw your own conclusions.


Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/

Andrea Censi

2007-Aug-27 21:03 UTC

head link

Benchmarks with TextMate's manual

On 8/27/07, Michel Fortin <michel.fortin at michelf.com>
wrote:> The following benchmarks have been obtained using the TextMate manual
> as the input source:
>
>   <http://macromates.com/textmate/manual/source.tbz>
>
...> Now, the interesting part of the test: combining all the documents
> together and parsing them in one shot (352 Ko). With PHP Markdown it
> takes 29 seconds; with Markdown.pl 1.0.1 it takes 71 seconds. Beside
> the obvious speed difference between PHP Markdown and Markdown.pl
> (probably due to what I mentioned above), this test shows that
> neither PHP Markdown or Markdown.pl scale well for big documents.
Maruku takes 8 seconds for parsing (on my PowerBook G4 1.5GHz).
(please note that Ruby, per se, is much slower than Perl)

I guess that if you plot [time for parsing] versus [length of the
document], you get a curve which grows more than linearly for
Markdown.pl and PHP Markdown.

This is the same behaviour that I observed in Bluecloth (straight port
of Markdown.pl in Ruby) -- if I remember well, time was O(length^2).
By comparison, Maruku, and other real parsers, takes O(length).

At the time, I concluded that it was due to a naive implementation of
regexp substitution in Ruby. But I don't know much about regexps in
the end, and know even less about Perl and PHP, so I'll shut up and
ask you: what do you think this scaling problem is due to?
> I'll let you draw your own conclusions.
That we need a real grammar! and real parsers!

-- 
Andrea Censi
      "Life is too important to be taken seriously" (Oscar Wilde)
Web: http://www.dis.uniroma1.it/~censi

Michel Fortin

2007-Aug-28 16:47 UTC

head link

Benchmarks with TextMate's manual

Here's a little followup on what I wrote earlier with a few more  
details and some good news for PHP Markdown.

First, there has been an error in my previous benchmarks. Merging all  
the documents together creates a 176 Kb file, not 352 Kb as  
previously mentioned. I'm not sure how this happend, but it seems I  
performed the tests on this oversized file which contained the manual  
twice. Performing the tests again on the right file, I get this:

     PHP Markdown 1.0.1g: 12 seconds
     Markdown.pl 1.0.1:   17 seconds
     (iBook G4 1.2 Ghz)

which is still much slower than parsing each file separately, but is  
nevertheless better than the previous results (obviously, since the  
file is smaller).

Now, here is what I found about parsing.

At the core of PHP Markdown's speed problem is the "unhash"
method,
which by using PHP's str_replace function with an array of all the  
hashed values replaces all the hashed content it can find. This array  
of hashed values grows more or less linearly with the content size,  
and looping through each of these values for each paragraph makes the  
parser O(n^2).

Now, one thing of interest is the result for the latest release of  
PHP Markdown (1.0.1h), still for the same 176 Kb file as above:

     PHP Markdown 1.0.1h: 66 seconds

Ouch! Not much has changed between 1.0.1g and 1.0.1h, but something  
clearly isn't right. Version 1.0.1h is calling "unhash" much more
than its predecessor, resulting in much worse performance, especially  
noticeable with big files.

With "unhash" fixed now (using a regular expression!) and with some  
other speed improvements, I can announce that the next version of PHP  
Markdown will parse the one-document TextMate manual in about 1.5  
seconds. This is now 0.5 second faster than parsing each of the  
documents separately.

I think I've also reached O(n) with PHP Markdown, at least in the  
general case. This is supported by parsing the big 352 Kb document in  
about 3 seconds. Twice the size, twice the time.

Also, I've included the TextMate manual in my local installation of  
the MDTest testsuite so I don't end up releasing a version of PHP  
Markdown that doesn't scale well in the future.


Michel Fortin
michel.fortin at michelf.com
http://www.michelf.com/

Reasonably Related Threads

Search for more apparently analagous threads

Markdown Discuss - Aug 2007 - Benchmarks with TextMate's manual

Benchmarks with TextMate's manual

Benchmarks with TextMate's manual

Benchmarks with TextMate's manual

Reasonably Related Threads