thr3ads.net - Redcloth upwards - Benchmarking the pure-ruby parser [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Jason Garber

2009-Jun-11 14:49 UTC

Benchmarking the pure-ruby parser

Benchmarking the pure-ruby version of the parser, which is a new  
option in 4.2...

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 4.2.0 compiled in Ruby...
Finished in 30.703538 seconds

30 seconds is a long time compared to...

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 3.0.4 compiled in ruby-regex...
Finished in 0.346966 seconds

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 4.0.0 compiled in C...
Finished in 0.055147 seconds

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 4.1.1 compiled in C...
Finished in 0.05756 seconds

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 4.1.9 compiled in C...
Finished in 0.140867 seconds

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 4.2.0 compiled in C...
Finished in 0.13385 seconds

Uh, yeah.  I knew it was slow, but until now I had no idea it was  
_that_ slow compared to 3.0.4.  The tests in 3.0.4 took a long time to  
run, so I just figured the two all-ruby versions would be about the  
same.  Now that I think about it, it does make sense that even lots of  
Regexes would be faster than doing all the looping, character  
comparison, and concatenation in Ruby.

Also note that RedCloth has been getting slower with each version,  
except this last one where I tried hard to reduce complexity in some  
areas.  Fixing bugs usually means making the machine more specific,  
which means more complexity.  The parser binary has more than doubled  
in size since 4.0.0.

Let''s see about Ruby 1.8 vs 1.9...

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 3.0.4 compiled in ruby-regex...
Finished in 0.320067 seconds

~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 3.0.4 compiled in ruby-regex...
Finished in 0.566467 seconds

So 3.0.4 is _slower_ in Ruby 1.9.  Interesting.  Is 4.2.0?

~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 4.2.0 compiled in C...
Finished in 0.148924 seconds

~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O spec/ 
spec.opts
Benchmarking version 4.2.0 compiled in C...
Finished in 0.107696 seconds

Thankfully, no.

My conclusion is, I need to do something different with RedCloth in  
the long term if I want it available as a pure-Ruby library and if I  
don''t want the size and parse time to keep ballooning.

Jason

Gaspard Bucher

2009-Jun-11 16:44 UTC

head link

Benchmarking the pure-ruby parser

Thanks for the very interesting comparaisons Jason. Pure ruby could
definitely be an option.

As I said, I really need a good, ruby extendable textile parser for
zena in the long run so I''d be glad to help. I have implemented
several pure ruby (regex based) parsers (parser for
http://zenadmin.org/zafu is one) and other ragel based (json/scripts
for http://rubyk.org for example).

Regarding 3.0.4: I see most regexp are not left anchored (starting
with /\A.../). I think we could get a nice speedup (and a more robust
parser) if we used some kind of state machine instead of raw split and
gsub. The zafu parser for example uses contextual regexps and eats
input data as it matches. This means that input data is only parsed a
minimal amount of times.

Gaspard

On Thu, Jun 11, 2009 at 4:49 PM, Jason Garber<jg at jasongarber.com>
wrote:> Benchmarking the pure-ruby version of the parser, which is a new option in
> 4.2...
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 4.2.0 compiled in Ruby...
> Finished in 30.703538 seconds
>
> 30 seconds is a long time compared to...
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 3.0.4 compiled in ruby-regex...
> Finished in 0.346966 seconds
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 4.0.0 compiled in C...
> Finished in 0.055147 seconds
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 4.1.1 compiled in C...
> Finished in 0.05756 seconds
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 4.1.9 compiled in C...
> Finished in 0.140867 seconds
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 4.2.0 compiled in C...
> Finished in 0.13385 seconds
>
> Uh, yeah. ?I knew it was slow, but until now I had no idea it was _that_
> slow compared to 3.0.4. ?The tests in 3.0.4 took a long time to run, so I
> just figured the two all-ruby versions would be about the same. ?Now that I
> think about it, it does make sense that even lots of Regexes would be
faster
> than doing all the looping, character comparison, and concatenation in
Ruby.
>
> Also note that RedCloth has been getting slower with each version, except
> this last one where I tried hard to reduce complexity in some areas.
?Fixing
> bugs usually means making the machine more specific, which means more
> complexity. ?The parser binary has more than doubled in size since 4.0.0.
>
> Let''s see about Ruby 1.8 vs 1.9...
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 3.0.4 compiled in ruby-regex...
> Finished in 0.320067 seconds
>
> ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O
> spec/spec.opts
> Benchmarking version 3.0.4 compiled in ruby-regex...
> Finished in 0.566467 seconds
>
> So 3.0.4 is _slower_ in Ruby 1.9. ?Interesting. ?Is 4.2.0?
>
> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O
spec/spec.opts
> Benchmarking version 4.2.0 compiled in C...
> Finished in 0.148924 seconds
>
> ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O
> spec/spec.opts
> Benchmarking version 4.2.0 compiled in C...
> Finished in 0.107696 seconds
>
> Thankfully, no.
>
> My conclusion is, I need to do something different with RedCloth in the
long
> term if I want it available as a pure-Ruby library and if I don''t
want the
> size and parse time to keep ballooning.
>
> Jason
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>

Jason Garber

2009-Jun-11 17:57 UTC

head link

Benchmarking the pure-ruby parser

On Jun 11, 2009, at 1:24 PM, Hugh Sasse wrote:> Is the ruby code to do this in the RedCloth-4.2.0 distro?  I can''t
see
> any parsers in there (grep -lir parser .) except the C code.
You have to get the source code yourself and build it with rake  
pureruby compile.  I didn''t include it in any package/distro because I
want to discourage its use (it is 100x slower, after all!).

Jason Garber

2009-Jun-11 18:00 UTC

head link

Benchmarking the pure-ruby parser

Gaspard, thanks for offering to help!  I''ll surely take you up on it? 
seems like you have some very helpful experience.  Want to do a little  
more research on it myself first, though.  Then maybe you and I can  
get together and work out some prototypes.

Jason

On Jun 11, 2009, at 12:44 PM, Gaspard Bucher wrote:
> Thanks for the very interesting comparaisons Jason. Pure ruby could
> definitely be an option.
>
> As I said, I really need a good, ruby extendable textile parser for
> zena in the long run so I''d be glad to help. I have implemented
> several pure ruby (regex based) parsers (parser for
> http://zenadmin.org/zafu is one) and other ragel based (json/scripts
> for http://rubyk.org for example).
>
> Regarding 3.0.4: I see most regexp are not left anchored (starting
> with /\A.../). I think we could get a nice speedup (and a more robust
> parser) if we used some kind of state machine instead of raw split and
> gsub. The zafu parser for example uses contextual regexps and eats
> input data as it matches. This means that input data is only parsed a
> minimal amount of times.
>
> Gaspard
>
> On Thu, Jun 11, 2009 at 4:49 PM, Jason Garber<jg at jasongarber.com>
> wrote:
>> Benchmarking the pure-ruby version of the parser, which is a new  
>> option in
>> 4.2...
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 4.2.0 compiled in Ruby...
>> Finished in 30.703538 seconds
>>
>> 30 seconds is a long time compared to...
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 3.0.4 compiled in ruby-regex...
>> Finished in 0.346966 seconds
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 4.0.0 compiled in C...
>> Finished in 0.055147 seconds
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 4.1.1 compiled in C...
>> Finished in 0.05756 seconds
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 4.1.9 compiled in C...
>> Finished in 0.140867 seconds
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 4.2.0 compiled in C...
>> Finished in 0.13385 seconds
>>
>> Uh, yeah.  I knew it was slow, but until now I had no idea it was  
>> _that_
>> slow compared to 3.0.4.  The tests in 3.0.4 took a long time to  
>> run, so I
>> just figured the two all-ruby versions would be about the same.   
>> Now that I
>> think about it, it does make sense that even lots of Regexes would  
>> be faster
>> than doing all the looping, character comparison, and concatenation  
>> in Ruby.
>>
>> Also note that RedCloth has been getting slower with each version,  
>> except
>> this last one where I tried hard to reduce complexity in some  
>> areas.  Fixing
>> bugs usually means making the machine more specific, which means more
>> complexity.  The parser binary has more than doubled in size since  
>> 4.0.0.
>>
>> Let''s see about Ruby 1.8 vs 1.9...
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 3.0.4 compiled in ruby-regex...
>> Finished in 0.320067 seconds
>>
>> ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O
>> spec/spec.opts
>> Benchmarking version 3.0.4 compiled in ruby-regex...
>> Finished in 0.566467 seconds
>>
>> So 3.0.4 is _slower_ in Ruby 1.9.  Interesting.  Is 4.2.0?
>>
>> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ 
>> spec.opts
>> Benchmarking version 4.2.0 compiled in C...
>> Finished in 0.148924 seconds
>>
>> ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O
>> spec/spec.opts
>> Benchmarking version 4.2.0 compiled in C...
>> Finished in 0.107696 seconds
>>
>> Thankfully, no.
>>
>> My conclusion is, I need to do something different with RedCloth in  
>> the long
>> term if I want it available as a pure-Ruby library and if I
don''t
>> want the
>> size and parse time to keep ballooning.
>>
>> Jason
>> _______________________________________________
>> Redcloth-upwards mailing list
>> Redcloth-upwards at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>>
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards

Gaspard Bucher

2009-Jun-11 18:25 UTC

head link

Benchmarking the pure-ruby parser

For benchmarking, I think it could make sense to have some
representative data of a typical application. This might mean large
amounts of pure text without any textile tags (except paragraphs), or
it might be a mixture of very short texts with a few very long ones. I
could give you the data from http://zenadmin.org, but the links and
images are not very typical:

"":45 (internal link to node 45)
> Want to do a little more research on it myself first, though.
Of course ! Please continue to give us feedback. This kind of
experimenting can provide very useful information on the different
parsing solutions.

G.

Hugh Sasse

2009-Jun-11 18:26 UTC

head link

Benchmarking the pure-ruby parser

On Thu, 11 Jun 2009, Jason Garber wrote:
> On Jun 11, 2009, at 1:24 PM, Hugh Sasse wrote:
> > Is the ruby code to do this in the RedCloth-4.2.0 distro?  I
can''t see
> > any parsers in there (grep -lir parser .) except the C code.
> 
> You have to get the source code yourself and build it with rake pureruby
> compile.  I didn''t include it in any package/distro because I want
to
> discourage its use (it is 100x slower, after all!).
Ok, just wondered if I could see any speedups in a few mins.  I''ll
leave
this for now.  I will have a bit more time in a few weeks I think.

       Hugh > 
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards

Justin Collins

2009-Jun-11 22:07 UTC

head link

Benchmarking the pure-ruby parser

Hugh Sasse wrote:> Is the ruby code to do this in the RedCloth-4.2.0 distro?  I can''t
see
> any parsers in there (grep -lir parser .) except the C code.
>
> I''m wondering, much of the work of a parser will be looking for
> fixed tokens, won''t it?  I know for numbers and arbitrary strings
> this isn''t true... I ask because one of the few weaknesses in Ruby
> is that there is an over-emphasis on Regexps for processing strings
> in the language, and the only way I know to test whether some string
> (str) begins with a substring (sought), other than regexps, is
> str.index(sought).zero? .  Have I missed something obvious?  It seems
> to me that constructing something as complex as a regexp for such an
> operation would be expensive.  Even str.scan which will accept a string
> rather than a regexp seems to treat such a string as a regexp, from a
> quick look at the c code.
>
>         Hugh
>   
Using index() to check the beginning of the string will be very slow, 
since will search the entire string for the substring, although you are 
only interested in the beginning of the string. I would suggest 
str[0,sought.length] == sought, which is considerably faster, especially 
for short substrings. For longer ones, it seems (from some quick 
benchmarking) that each approach is about equal.

-Justin


require ''benchmark''
include Benchmark

str = "adiaodibasd" * 1000

sought = "hello"

bmbm do |t|
    t.report "Index" do
        100000.times do
            str.index(sought) == 0
        end
    end

    t.report "Regexp" do
        reg = /^#{sought}/
        100000.times do
            str =~ reg
        end
    end

    t.report "Slice" do
        100000.times do
            str[0,sought.length] == sought
        end
    end
end

Gaspard Bucher

2009-Jun-16 20:29 UTC

head link

Starting the next version of RedCloth

I followed some of the discussions on the treetop mailing list and it
sure looks great. I''ll have a look at the subset you are parsing and
if that''s not too much, I''ll try an Oniguruma based parser.
>From what I understand, Treetop builds an abstract syntax tree withrendering methods attached. This means that the parser knows about
everything there is to know about parsing *and* rendering (Interpreter
pattern).

I''ll try to build something that could be totally agnostic of what we
actually do with the AST (or S-expression), may it be pdf, html, latex
or other SM gears (Visitor pattern).

Gaspard


On Tue, Jun 16, 2009 at 10:02 PM, Jason Garber<jg at jasongarber.com>
wrote:> Gaspard (and others who are interested),
> I''ve been researching some tools in this domain to figure out what
they''re
> good at and what they''re not. ?I like the advantages of parsing
expression
> grammars (PEGs) over regular expressions so I started a test rewrite with
> Treetop to find out what the gotchas are (there are always gotchas).
?I''ve
> hit surprisingly few and mostly they were indicative that I was thinking
> about the problem wrongly. ?Overall it''s going great,
it''s fast (so far) and
> very elegant. ?I can get down and spec individual parts of the grammar or
> even individual rules!
> I''m interested to see what you would come up with using contextual
regexps
> like you mentioned for the zafu parser. ?I''d welcome you to build
a little
> prototype like I''m building with Treetop
> (http://github.com/jgarber/redcloth-treetop). ?If yours implements the same
> subset of Textile, then we can do some benchmarking, compare the grammars
> and so forth. ?Don''t feel obligated?just know that I''m
open-minded about it.
> I think I''d really like to stay away from C/Java unless someone
demonstrates
> that it''s the only way to parse with any speed. ?My experience so
far with
> Treetop indicates a Ruby parser can?be fast enough (and if speed were
really
> more important to us than some other factors, we wouldn''t have
picked Ruby
> in the first place!).
> Either way, I also agree that it needs to be durable over the long-term.
> ?I''m making websites in Textile right and left and I
can''t be updating them
> every time the parser changes. ?I think lots of unit tests are the only
sure
> way to prevent regression (like happened in 4.2.0) and I''ve been
very
> pleased with how I can spec the behavior of individual rules and grammars
in
> my treetop prototype.
> So, if you want to start a prototype regexp parser, that''s cool,
or if you
> like what you see on my prototype project, you''re welcome to join
that as
> well.
> Regards,
> Jason
> On Jun 11, 2009, at 2:25 PM, Gaspard Bucher wrote:
>
> For benchmarking, I think it could make sense to have some
> representative data of a typical application. This might mean large
> amounts of pure text without any textile tags (except paragraphs), or
> it might be a mixture of very short texts with a few very long ones. I
> could give you the data from http://zenadmin.org, but the links and
> images are not very typical:
>
> "":45 (internal link to node 45)
>
> Want to do a little more research on it myself first, though.
>
> Of course ! Please continue to give us feedback. This kind of
> experimenting can provide very useful information on the different
> parsing solutions.
>
> G.
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>
>
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>

Jason Garber

2009-Jun-16 20:35 UTC

head link

Starting the next version of RedCloth

Yes, you can use Treetop for the interpreter pattern out of the box,  
but I''m doing following the visitor pattern like Cucumber did.  Just  
started working on the visitor, actually.  And my posts to the treetop  
list are a little old?I''ve moved past the problems I described there? 
so don''t read them as the current state of things.

Hope I''m using all the terminology right.  I''m new to all this
(no
CompSci background).

On Jun 16, 2009, at 4:29 PM, Gaspard Bucher wrote:
> I followed some of the discussions on the treetop mailing list and it
> sure looks great. I''ll have a look at the subset you are parsing
and
> if that''s not too much, I''ll try an Oniguruma based
parser.
>
>> From what I understand, Treetop builds an abstract syntax tree with
> rendering methods attached. This means that the parser knows about
> everything there is to know about parsing *and* rendering (Interpreter
> pattern).
>
> I''ll try to build something that could be totally agnostic of what
we
> actually do with the AST (or S-expression), may it be pdf, html, latex
> or other SM gears (Visitor pattern).
>
> Gaspard
>
>
> On Tue, Jun 16, 2009 at 10:02 PM, Jason Garber<jg at jasongarber.com>
> wrote:
>> Gaspard (and others who are interested),
>> I''ve been researching some tools in this domain to figure out
what
>> they''re
>> good at and what they''re not.  I like the advantages of
parsing
>> expression
>> grammars (PEGs) over regular expressions so I started a test  
>> rewrite with
>> Treetop to find out what the gotchas are (there are always  
>> gotchas).  I''ve
>> hit surprisingly few and mostly they were indicative that I was  
>> thinking
>> about the problem wrongly.  Overall it''s going great,
it''s fast (so
>> far) and
>> very elegant.  I can get down and spec individual parts of the  
>> grammar or
>> even individual rules!
>> I''m interested to see what you would come up with using
contextual
>> regexps
>> like you mentioned for the zafu parser.  I''d welcome you to
build a
>> little
>> prototype like I''m building with Treetop
>> (http://github.com/jgarber/redcloth-treetop).  If yours implements  
>> the same
>> subset of Textile, then we can do some benchmarking, compare the  
>> grammars
>> and so forth.  Don''t feel obligated?just know that
I''m open-minded
>> about it.
>> I think I''d really like to stay away from C/Java unless
someone
>> demonstrates
>> that it''s the only way to parse with any speed.  My experience
so
>> far with
>> Treetop indicates a Ruby parser can be fast enough (and if speed  
>> were really
>> more important to us than some other factors, we wouldn''t have
>> picked Ruby
>> in the first place!).
>> Either way, I also agree that it needs to be durable over the long- 
>> term.
>>  I''m making websites in Textile right and left and I
can''t be
>> updating them
>> every time the parser changes.  I think lots of unit tests are the  
>> only sure
>> way to prevent regression (like happened in 4.2.0) and I''ve
been very
>> pleased with how I can spec the behavior of individual rules and  
>> grammars in
>> my treetop prototype.
>> So, if you want to start a prototype regexp parser, that''s
cool, or
>> if you
>> like what you see on my prototype project, you''re welcome to
join
>> that as
>> well.
>> Regards,
>> Jason
>> On Jun 11, 2009, at 2:25 PM, Gaspard Bucher wrote:
>>
>> For benchmarking, I think it could make sense to have some
>> representative data of a typical application. This might mean large
>> amounts of pure text without any textile tags (except paragraphs), or
>> it might be a mixture of very short texts with a few very long  
>> ones. I
>> could give you the data from http://zenadmin.org, but the links and
>> images are not very typical:
>>
>> "":45 (internal link to node 45)
>>
>> Want to do a little more research on it myself first, though.
>>
>> Of course ! Please continue to give us feedback. This kind of
>> experimenting can provide very useful information on the different
>> parsing solutions.
>>
>> G.
>> _______________________________________________
>> Redcloth-upwards mailing list
>> Redcloth-upwards at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>>
>>
>> _______________________________________________
>> Redcloth-upwards mailing list
>> Redcloth-upwards at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>>
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards

Gaspard Bucher

2009-Jun-17 07:59 UTC

head link

Starting the next version of RedCloth

Hi Hugh !

I went through a description of the top down parser using Python
(http://effbot.org/zone/simple-top-down-parsing.htm) and it''s funny
how we keep reinventing the wheel: I just implemented something
similar for querybuilder, using a ruby array as stack and poping the
stack depending on operator precedence (http://tinyurl.com/mgj6et).

Using operator precedence could be an interesting solution to solve
inherent textile ambiguities but it does not remove the curse of
tokenizing.

I''ll keep this operator precedence idea in mind when writing a regexp
based prototype.

G.

On Wed, Jun 17, 2009 at 12:45 AM, Hugh Sasse<hgs at dmu.ac.uk>
wrote:> It sounds like you are not looking for alternative techniques any
> more. That''s good, but I''ve found Top Down Operator
Precedence
> parsers, as explained in Chapter 9 of "Beautiful Code" [Oram and
> Wilson, O''Reilly] particularly easy to understand, and relatively
OK
> to debug as well. ?The chapter is an implementation and explanation
> of a paper by Vaughan Pratt in 1973, but it''s an ACM paper, so
access
> to it may be a problem.
>
> ? ? ? ?HTH
> ? ? ? ?Hugh
>
> On Tue, 16 Jun 2009, Jason Garber wrote:
>
>> Yes, you can use Treetop for the interpreter pattern out of the box,
but I''m
>> doing following the visitor pattern like Cucumber did. ?Just started
working
>> on the visitor, actually. ?And my posts to the treetop list are a
little
>> old?I''ve moved past the problems I described there?so
don''t read them as the
>> current state of things.
>>
>> Hope I''m using all the terminology right. ?I''m new to
all this (no CompSci
>> background).
>>
>> On Jun 16, 2009, at 4:29 PM, Gaspard Bucher wrote:
>>
>> > I followed some of the discussions on the treetop mailing list and
it
>> > sure looks great. I''ll have a look at the subset you are
parsing and
>> > if that''s not too much, I''ll try an Oniguruma
based parser.
>> >
>> > > From what I understand, Treetop builds an abstract syntax
tree with
>> > rendering methods attached. This means that the parser knows about
>> > everything there is to know about parsing *and* rendering
(Interpreter
>> > pattern).
>> >
>> > I''ll try to build something that could be totally
agnostic of what we
>> > actually do with the AST (or S-expression), may it be pdf, html,
latex
>> > or other SM gears (Visitor pattern).
>> >
>> > Gaspard
>> >
>> >
>> > On Tue, Jun 16, 2009 at 10:02 PM, Jason Garber<jg at
jasongarber.com> wrote:
>> > > Gaspard (and others who are interested),
>> > > I''ve been researching some tools in this domain to
figure out what they''re
>> > > good at and what they''re not. ?I like the advantages
of parsing expression
>> > > grammars (PEGs) over regular expressions so I started a test
rewrite with
>> > > Treetop to find out what the gotchas are (there are always
gotchas). ?I''ve
>> > > hit surprisingly few and mostly they were indicative that I
was thinking
>> > > about the problem wrongly. ?Overall it''s going
great, it''s fast (so far)
>> > > and
>> > > very elegant. ?I can get down and spec individual parts of
the grammar or
>> > > even individual rules!
>> > > I''m interested to see what you would come up with
using contextual regexps
>> > > like you mentioned for the zafu parser. ?I''d welcome
you to build a little
>> > > prototype like I''m building with Treetop
>> > > (http://github.com/jgarber/redcloth-treetop). ?If yours
implements the
>> > > same
>> > > subset of Textile, then we can do some benchmarking, compare
the grammars
>> > > and so forth. ?Don''t feel obligated?just know that
I''m open-minded about
>> > > it.
>> > > I think I''d really like to stay away from C/Java
unless someone
>> > > demonstrates
>> > > that it''s the only way to parse with any speed. ?My
experience so far with
>> > > Treetop indicates a Ruby parser can be fast enough (and if
speed were
>> > > really
>> > > more important to us than some other factors, we
wouldn''t have picked Ruby
>> > > in the first place!).
>> > > Either way, I also agree that it needs to be durable over the
long-term.
>> > > I''m making websites in Textile right and left and I
can''t be updating them
>> > > every time the parser changes. ?I think lots of unit tests
are the only
>> > > sure
>> > > way to prevent regression (like happened in 4.2.0) and
I''ve been very
>> > > pleased with how I can spec the behavior of individual rules
and grammars
>> > > in
>> > > my treetop prototype.
>> > > So, if you want to start a prototype regexp parser,
that''s cool, or if you
>> > > like what you see on my prototype project, you''re
welcome to join that as
>> > > well.
>> > > Regards,
>> > > Jason
>> > > On Jun 11, 2009, at 2:25 PM, Gaspard Bucher wrote:
>> > >
>> > > For benchmarking, I think it could make sense to have some
>> > > representative data of a typical application. This might mean
large
>> > > amounts of pure text without any textile tags (except
paragraphs), or
>> > > it might be a mixture of very short texts with a few very
long ones. I
>> > > could give you the data from http://zenadmin.org, but the
links and
>> > > images are not very typical:
>> > >
>> > > "":45 (internal link to node 45)
>> > >
>> > > Want to do a little more research on it myself first, though.
>> > >
>> > > Of course ! Please continue to give us feedback. This kind of
>> > > experimenting can provide very useful information on the
different
>> > > parsing solutions.
>> > >
>> > > G.
>> > > _______________________________________________
>> > > Redcloth-upwards mailing list
>> > > Redcloth-upwards at rubyforge.org
>> > > http://rubyforge.org/mailman/listinfo/redcloth-upwards
>> > >
>> > >
>> > > _______________________________________________
>> > > Redcloth-upwards mailing list
>> > > Redcloth-upwards at rubyforge.org
>> > > http://rubyforge.org/mailman/listinfo/redcloth-upwards
>> > >
>> > _______________________________________________
>> > Redcloth-upwards mailing list
>> > Redcloth-upwards at rubyforge.org
>> > http://rubyforge.org/mailman/listinfo/redcloth-upwards
>>
>> _______________________________________________
>> Redcloth-upwards mailing list
>> Redcloth-upwards at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/redcloth-upwards
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>

Redcloth upwards - Jun 2009 - Benchmarking the pure-ruby parser

Benchmarking the pure-ruby parser

Benchmarking the pure-ruby parser

Benchmarking the pure-ruby parser

Benchmarking the pure-ruby parser

Benchmarking the pure-ruby parser

Benchmarking the pure-ruby parser

Benchmarking the pure-ruby parser

Starting the next version of RedCloth

Starting the next version of RedCloth

Starting the next version of RedCloth