I've just spent a happy couple of days writing a text file for formatting
with
Markdown. The results are phenomenal! It's easy to write, and attractive to
read. I'll try to send a Christmas present to the writer :-)
But when I looked at the perl source, I also found that it was small and
well-structured. Of course, I couldn't resist adding a few things that
appealled
to me. I'll share them here in case anyone has comments ...
Firstly, I wanted to be able to get to other inline font markeup than italic and
bold. In e-mail, I write /italic/ and *bold*, but when I tried that in
DoItalicAndBold() it didn't work because of the /s that were already added
as
part of close-tags. So I decided to use |italic|, *bold*, ^superscript^,
_subscript_, =underline=, ~strikethrough~, $big$ and %small% to go with
`typewriter` (which seems to be done differently), implemented like this:
sub _DoItalicsAndBold {
my $text = shift;
$text =~ s{ (\|) (?=\S) (.+?) (?<=\S) \1 } {<i>$2</i>}gsx;
$text =~ s{ (\*) (?=\S) (.+?) (?<=\S) \1 } {<b>$2</b>}gsx;
$text =~ s{ (\^) (?=\S) (.+?) (?<=\S) \1 } {<sup>$2</sup>}gsx;
$text =~ s{ (_) (?=\S) (.+?) (?<=\S) \1 } {<sub>$2</sub>}gsx;
$text =~ s{ (=) (?=\S) (.+?) (?<=\S) \1 } {<u>$2</u>}gsx;
$text =~ s{ (~) (?=\S) (.+?) (?<=\S) \1 } {<s>$2</s>}gsx;
$text =~ s{ (\$) (?=\S) (.+?) (?<=\S) \1 } {<big>$2</big>}gsx;
$text =~ s{ (%) (?=\S) (.+?) (?<=\S) \1 }
{<small>$2</small>}gsx;
# was:
# # <strong> must go first:
# $text =~ s{ (\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1 }
# {<strong>$2</strong>}gsx;
#
# $text =~ s{ (\*|_) (?=\S) (.+?) (?<=\S) \1 }
# {<em>$2</em>}gsx;
return $text;
}
I was happy with this, but I'm a techie guy, and I like to use mathematical
and
greek characters. So I also added to the same function:
# { unicode character name }
$text =~ s{ \{ (?=\S) (.+?) (?<=\S) \} }
{$_ = lc $1; s/[ \n\r]+/ /g;
"&#".$g_unidata{$_}.";"}egsx;
where g_unidata is created like this:
# Table of unicode character names
my %g_unidata;
open (UNIDATA, "C:/Program Files/Markdown/UnicodeData.txt") or
die "UnicodeData.txt not found";
while (<UNIDATA>)
{
@_ = split (/;/);
$g_unidata{lc $_[1]} = hex $_[0];
$g_unidata{lc $_[10]} = hex $_[0];
}
close (UNIDATA);
You can get UnicodeData.txt from "www.unicode.org". This lets me write
things
like {greek small letter pi} {approximately equal to} 22{division sign}7
\3.1{non-spacing dot above}42857{non-spacing dot above}, San Jose{non-spacing
acute}, %CA% with nice results: it's not too bad to read in source, and it
wasn't hard to implement. In Firefox and IE, the {non-spacing acute} is
correctly positioned over the "e" (more or less). Amazing!
My logic for selecting characters was to take out of ASCII the characters that
are used for writing regular prose (in accordance with the Markdown philosophy
as described on the web site). By my reckoning these are 52 letters, 10 digits
and ! " ' ( ) , - . / : ; ? (arguably [ ] are used for editorial
comments and *
is used to mark footnotes, and arguably / isn't used in "real"
prose, but this
seems like a fair choice). This leaves us with # $ % & * + < = > @ [ \
] ^ _ ` {
| } ~ to have fun with:
# marks headers
$ <big> (mnemonic: $1 is 100 cents---it's big!)
% <small> (mnemonic: 1% is 1/100: it's small!)
&
* <b>
+
< opens a tag
= <u> (mnemonic: it's a hyphen, underlined)
> closes a tag
@
[ opens links
\ escapes one of these characters
] closes links
^ <sup>
_ <sub>
` <tt>
{ opens a character name
| <i>
} closes a character name
~ <s> (mnemonic: looks like a handwritten strikethrough)
I suppose in real e-mail, _thing_ would normally indicate underlining. But
I'm
too much of a TeX fan to want _ for anything but subscripting.
I also added nested inline elements: *|bold italic|*, |_italic subscript_| etc,
but the code is too bad to share :-(
I hope this message is received in the spirit it's intended. If there are
serious Markdown standardisation efforts underway, I'm not intending to
undermine them. The above was done for my own use and amusement, and could never
have happened without John Gruber's fine creation: a work of genius!
I've looked back a little at the recent mailing list archives, and I see
there's
a discussion of block-level markup. I'd like to have a way to right-align,
centre or justify a paragraph, for example. Hard to think of a
"natural" way
that people do that in e-mail. Maybe
---If the first source line starts with 4 spaces, the para is ragged-left, else
left-justified.
---If the first source line ends with 4 spaces, the para is ragged-right, else
right-justified.
You would use it in things like signatures, and if Markdown ever does tables (I
suppose using lots of | - + characters) the same principal might be recyclable.
Anyway, thanks for lots of fun! :-) Here's my right-aligned signature
(with
embedded break):
... Jonathan Coxhead
Belmont CA 94002