Hey guys,
I have been working with RedCloth for the past week or so in order to try
and upgrade a University''s blog system (blogs.warwick.ac.uk) to
Textile2,
having used a bespoke Java-based Textile1 implementation in the past. As
such, it''s been an important part of this that it works as much as
possible
to give the same results for Textile1 code, but with the added features
(footnotes, tables) of Textile2.
The RedCloth 3.0.4 release gave me too many headaches, so I''ve been
working
with the latest out of SVN and I think I''ve uncovered a few bugs that
I''d be
interested in sharing the code for if anyone wants it: (All of the changes
are to base.rb)
Auto-magic URLs:
There''s a couple of things that were a little iffy. Firstly, I changed
it so
that the parsing for auto-magic linking of normal links AND emails are done
twice (so I added the rule twice to TEXTILE_RULES in both cases):
TEXTILE_RULES = [:refs_textile, :block_textile_table,
:block_textile_lists, :block_textile_defs,
:block_textile_prefix, :inline_textile_image,
:inline_textile_link,
:inline_textile_code, :inline_textile_span,
:glyphs_textile,
:inline_textile_autolink_urls,
:inline_textile_autolink_urls, :inline_textile_autolink_emails,
:inline_textile_autolink_emails] # do these twice to accomodate for one
space/line diffs
This fixes any problems with two URLs appearing with a single space or
linebreak between them.
Also, the AUTO_LINK_RE regular expression has a bug, where it matches any
character in the leading punctuation (I think the intention was to have a
list of punctuation to be not matched, but that means it matches any
character that isn''t punction, including word characters). I changed
this to
have a list of "whitelisted" punctuation (I only needed the
open-bracket at
the moment) and also to match any whitespace preceding the link:
AUTO_LINK_RE = /
( # leading text
<\w+.*?>| # leading HTML tag, or
[\(]| # selected punctuation, or
\s| # whitespace, or
^ # beginning of line
)
(
(?:http[s]?:\/\/)| # protocol spec, or
(?:www\.) # www.*
)
(
([\w]+[=?&:%\/\.\~\-]*)* # url segment
\w+[\/]? # url tail
(?:\#\w*)? # trailing anchor
)
([[:punct:]]|\s|<|$) # trailing text
/x
The IMAGE_RE regular expression also has a bug, where it matches any
character (.) for the start of a line. I replaced this with any whitespace.
Also, I made it so that the URL for a link *must* have a . in it:
IMAGE_RE = /
(\<p\>|\s|^) # start of line?
\! # opening
(\<|\=|\>)? # optional alignment atts
(#{C}) # optional style,class atts
(?:\. )? # optional dot-space
([^\s(!]+?\.[^\s(!]+?) # presume this is the src
\s? # optional space
(?:\(((?:[^\(\)]|\([^\)]+\))+?)\))? # optional title
\! # closing
(?::#{ HYPERLINK })? # optional href
/x
I made a couple of minor other alterations, such as adding the sterling (?)
symbol to the glyphs to be replaced with £, and adding a newline after
<br /> when hard_breaks are turned on to help readability.
In order to work well with our implementation, I also added an escape
capability, to allow a user to write \*text\* without the asterisks being
converted to <strong> tags.
If anyone wants the code for this then feel free to ask, I''m wary about
committing anything to SVN unless it''s really wanted/needed
Regards,
Mat
--
Mat Mannion
Web Developer
e-lab, University of Warwick
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/redcloth-upwards/attachments/20060829/a642efd7/attachment-0001.html