thr3ads.net - Redcloth upwards - 2 bugs when parsing emphasized or bold text [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Bas Kloet

2006-Jul-02 10:13 UTC

2 bugs when parsing emphasized or bold text

I''ve found 2 bugs that produce (imho) incorrect rendering results:

1) The regexp for strong (*) and bold (**) is greedy, which produces
very strange results. 

The simplest way to show the problem is to give an example.

This is the original code:

====Strong:
Lets do a little test *t*
this should not be strong *u*.

Bold:
Lets do another test **t**
this should not be bold **u**.
====
And this is the (relevant part of) the html that is produced:

====        <p>Strong:
Lets do a little test <strong>t*
this should not be strong *u</strong>.</p>


        <p>Bold:
Lets do another test <b>t<strong>*
this should not be bold *</strong>u</b>.</p>
====
As you can see, the html produced is not exactly what you would expect.

2) Using _TEXT_ to emphasize a string doesn''t work if TEXT spans
multiple lines.

If you want to emphasize a piece of text that spans multiple lines, then
_TEXT_ does not work, the underscores are simply shown in the generated
text, even if there are no hard linebreaks.



I filed these bugs both with the Debian bugtracker and the tracker on
rubyforge a couple of weeks ago, but there was no response. Though I''m
a
pretty decent ruby coder, the redcloth code is way over my head, so I
was wondering if anyone here has a solution to one or both of the above
problems? 

Thanks,
Bas

Christoffer Sawicki

2006-Jul-02 22:20 UTC

head link

2 bugs when parsing emphasized or bold text

> I''ve found 2 bugs that produce (imho) incorrect rendering results:
>
> 1) The regexp for strong (*) and bold (**) is greedy, which produces
> very strange results.
*snip*

I haven''t looked at the relevant RedCloth code, but the non-greedy
modifier in Ruby is "?". In other words: .* is greedy while .*?
isn''t.

-- 
Christoffer Sawicki

Bas Kloet

2006-Jul-03 07:05 UTC

head link

2 bugs when parsing emphasized or bold text

On Mon, Jul 03, 2006 at 12:20:53AM +0200, Christoffer Sawicki
wrote:> > I''ve found 2 bugs that produce (imho) incorrect rendering
results:
> >
> > 1) The regexp for strong (*) and bold (**) is greedy, which produces
> > very strange results.
> 
> *snip*
> 
> I haven''t looked at the relevant RedCloth code, but the non-greedy
> modifier in Ruby is "?". In other words: .* is greedy while .*?
isn''t.
> 
Thanks, but that''s not my real problem. I have a pretty good idea of
where in the code this is happening, and I know basic regular expression
syntax, but the following regexp code is just a bit too complicated for me:

=====
QTAGS = [
    [''**'', ''b''],
    [''*'', ''strong''],
    [''??'', ''cite'', :limit],
    [''-'', ''del'', :limit],
    [''__'', ''i''],
    [''_'', ''em'', :limit],
    [''%'', ''span'', :limit],
    [''+'', ''ins'', :limit],
    [''^'', ''sup''],
    [''~'', ''sub'']
]
QTAGS.collect! do |rc, ht, rtype|
    rcq = Regexp::quote rc
    re         case rtype
        when :limit
            /(\W)
            (#{rcq})
            (#{C})
            (?::(\S+?))?
            (\S.*?\S|\S)
            #{rcq}
            (?=\W)/x
        else
            /(#{rcq})
            (#{C})
            (?::(\S+))?
            (\S.*?\S|\S)
            #{rcq}/xm
        end
    [rc, ht, re, rtype]
end
=====
My main problem is that any trial-and-error modification of the code to fix
one problem spawns 2 new ones.

I hope this makes my problem a bit clearer.

Thanks,
Bas

Mark van Eijk

2006-Jul-05 09:56 UTC

head link

2 bugs when parsing emphasized or bold text

On Sun, Jul 02, 2006 at 12:13:43PM +0200, Bas Kloet
wrote:> I''ve found 2 bugs that produce (imho) incorrect rendering results:
> 
> 1) The regexp for strong (*) and bold (**) is greedy, which produces
> very strange results. 
> 
> The simplest way to show the problem is to give an example.
> 
> This is the original code:
> 
> ====> Strong:
> Lets do a little test *t*
> this should not be strong *u*.
> 
> Bold:
> Lets do another test **t**
> this should not be bold **u**.
> ====> 
> And this is the (relevant part of) the html that is produced:
> 
> ====>         <p>Strong:
> Lets do a little test <strong>t*
> this should not be strong *u</strong>.</p>
> 
> 
>         <p>Bold:
> Lets do another test <b>t<strong>*
> this should not be bold *</strong>u</b>.</p>
> ====> 
> As you can see, the html produced is not exactly what you would expect.
I''ve taken a quick look at it and minimized your example a bit:
====*t* not strong *u*.
====
This produces:
====<p><strong>t* not *u</strong></p>
====
But the funny thing is that the following:
====*tt* not strong *u*.
====
produces:
====<p><strong>tt</strong> not
<strong>u</strong></p>
====
So I don''t think the matching is really greedy. It just
doesn''t handle
1-character cases very well.

Mark

Bas Kloet

2006-Jul-05 21:07 UTC

head link

2 bugs when parsing emphasized or bold text

On Wed, Jul 05, 2006 at 11:56:57AM +0200, Mark van Eijk
wrote:> 
> So I don''t think the matching is really greedy. It just
doesn''t handle
> 1-character cases very well.
> Thanks, that makes the problem a lot clearer. 

I found a fix for the _ problem when text spans multiple lines myself. I
removed the :limit from the following line:
---
[''_'', ''em'', :limit]
---

There''s probably a good reason why the :limit was there, but all the
tests I''ve run produced correct results, so for the moment I''m
happy
about that.

Thanks for looking into the problem further.

Greetings,
Bas

Reasonably Related Threads

Search for more reasonably related threads

Redcloth upwards - Jul 2006 - 2 bugs when parsing emphasized or bold text

2 bugs when parsing emphasized or bold text

2 bugs when parsing emphasized or bold text

2 bugs when parsing emphasized or bold text

2 bugs when parsing emphasized or bold text

2 bugs when parsing emphasized or bold text

Reasonably Related Threads