thr3ads.net - Redcloth upwards - Need your advice on RedCloth [Jun 2009]

If this information is useful, please help other people find it:
Share via:
Gaspard Bucher
2009-Jun-08 11:06 UTC
Need your advice on RedCloth

Hi Jason !

I''ve looked to the ragel code for textile and you are right: it has
become quite hard to understand. I have gone through the list of
difficult defects and through the current textile reference and I have
the feeling that the current parser is quite complicated for the task
at hand. Textile does not look like such a complicated grammar (at
least not what is listed in the reference page), but maybe I''m wrong
and there are many places where determinism is not easily attained.

I really feel that the parts that are difficult for the parser are
also difficult for the reader when editing text. And most of these
hard-to-parse and hard-to-read features in textile (except for tables)
are not related to describing content but to styling: something like
setting an "id" in an article seems really bad to me: what if you
display two articles on one page and they both define "hot" id ? Same
goes with "em" padding: that''s not content, that''s
styling.

I feel very concerned about all these issues related to textile
because I am building a CMS in which my clients put *everything*:
letters, comments, documents, quality certification stuff, control
lists, etc. So I really need a textile parser that can survive in the
long run (10yrs). To achieve this goal, we need to:

a. have a parser that is easy to enhance with new needs without
breaking old text
b. have a grammar that is easy to parse

For point "a", I think we can live with S-expression generation and
customization during s-expression tree processing. For example an
image with caption would be parsed as:

!file.jpg (foo bar baz)! ==> [:image, "file.jpg (foo bar baz)"]

So the processor will run ruby regex to "finish the work". This means
the parser in "C" is kept simple and if someone wants to add more
features to the "image" tag, she just has to change the ruby regex.

For point "b": we need to *not* support shortcut syntax for styling
features such as the "id" thing or "em" padding (at least
not at the
"C" parser level). If someone really wants an em padding, she should
use html (it''s not nice to use and this is an indication that this is
bad practice) :

<div style=''padding-left:4em;''>
# one
# two
</div>

Since I *really* need such a tool, I could help refactoring redcloth
into a two step parser (half in "C", half in ruby).

What do you think ?

Gaspard
On Mon, Jun 8, 2009 at 6:40 AM, Jason Garber<jg at jasongarber.com>
wrote:> Gaspard, here''s a copy of my complaining to _why, which contains a
few
> examples of what I''m up against. ?This was awhile back, so
I''ve moved on a
> bit, but what you were asking about still applies.
> There''s probably a way to do everything I need to in Ragel, but I
can''t
> figure it out. I spent a few hours figuring out a different capture
> mechanism and thought I was being quite clever, but in the end it
didn''t
> work out. ?I was relieved because it felt like I was reinventing something
a
> tool should provide anyway.
> Jason
>
> Begin forwarded message:
>
> From: Jason Garber <jg at jasongarber.com>
> Date: June 1, 2009 8:50:23 AM EDT
> To: why the lucky stiff <why at whytheluckystiff.net>
> Subject: Need your advice on RedCloth
> Hi, why. ?I need your advice on RedCloth and Ragel. ?The current mark and
> capture mechanism has just gotten too ugly for me to handle. ?Since I took
> over the project, we''ve had to add several more
variables/macros/actions to
> mark fallback captures and I had to add a separate machine to parse
> attributes. ?It''s been livable, but as I fix more bugs it keeps
heading in
> that direction and I don''t like it.
>
> I''ve reduced the problem down to the deterministic nature of the
machine.
> ?For example, merging in PyTextile-style table attributes creates a
conflict
> when recognizing these two possibilities:
> (#myid)# This is a list item with an id
> (# This is a list item with padding-left:1em
>
> It has to get to the third character to know whether the first character
was
> a left indent or the start of the id, but by then it''s too
late?the indent
> has already been stored. ?You wind up getting the same output from (#myid)#
> one as you would ((#myid)# one. ?I solved it easily enough with a
> conditional action that looks to see if p+2 is a space, but I feel like I
> shouldn''t have to.
>
> I wish that from the start state there were two ''(''
transitions, one marking
> the indent, one the id. ?The branches would have different things both
> leading to a final state. ?At the final state, an action would discard the
> captured bits that were not on the path to the final state, leaving only
the
> things that "stuck." ?Basically, the same thing backtracking
regex engines
> do when matching /(\()?(\(#([a-z]+\))?# (.+)/
>
> Plus, there''s the matter of having to think about nondeterminism
at every
> step of the way, like when writing cite = "??" mtext
"??". ?I never thought
> about it that book titles might end with or contain a question mark. ?It
> took me nearly an hour to get that working, but a regex would have just
done
> it without me wasting brain cycles.
>
> I''ve also had to write duplicate patterns that don''t have
embedded actions
> so that I can look ahead (for extended blocks and such). ?So I wind up with
> duplicate patterns A and A_noactions, C and C_noactions...
>
> I''m new to all this stuff, but it seems Ragel produces a DFA, not
an NFA, so
> what I describe above isn''t possible. ?Is there a way to
accomplish it with
> Ragel? ?It''s tempting to just switch to Oniguruma. ?I''ll
bet it wouldn''t be
> too much slower if we interfaced with it directly in C and did all the
> string manipulations in C. ?Might have distribution problems.
>
> What do you think?
>
> Jason
>
> _______________________________________________
> Redcloth-upwards mailing list
> Redcloth-upwards at rubyforge.org
> http://rubyforge.org/mailman/listinfo/redcloth-upwards
>
Redcloth upwards - Jun 2009 - Need your advice on RedCloth

Need your advice on RedCloth