I have a txt file (attached) that defines equivalents among characters
in latin1 (or iso-8859-1), numeric &#xxx; codes, HTML entities
and latex equivalents. A portion of the file is shown inline below, but
may not be rendered well in this email.
I'd like to read this into R to use as a character translation table,
but am stuck on two things:
- The 5 fields in the file are column-aligned and are separated by 2+
white space characters.
In perl this is trivial to read and parse via something like
@entries = split("\n", $charTable);
foreach (@entries) {
($desc, $char, $code, $html, $tex) = split(/\s\s+/);
}
AFAIK, the only function for reading such data is utils::read.fwf, but I
have to specify the field widths.
I don't know of any function that allows even a simple regrex like this
as a sep= argument.
- The TeX field contains many backslashed codes that need to be escaped
in R. Is it necessarty
to manually edit the file to change '\pounds' --> '\\pounds',
'\S' -->
'\\S', etc. or is there something
like raw mode input that would do this where necessary?
Description Char
Code HTML TeX
double quote " " "
ampersand & & & \&
apostrophe ' ' '
less than < < < $<$
greater than > > > $>$
non-breaking space .   ~
inverted exclamation ? ¡ ¡ !'
cent sign ? ¢ ¢
pound sterling ? £ £ \pounds
general currency sign ? ¤ ¤
yen sign ? ¥ ¥
broken vertical bar ? ¦ ¦
section sign ? § § \S
umlaut (dieresis) ? ¨ ¨ \"{}
copyright ? © © \copyright
feminine ordinal ? ª ª $^a$
left angle quote, guillemotleft ? « «
\guillemotleft
not sign ? ¬ ¬
soft hyphen ? ­ ­
registered trademark ? ® ®
\textregistered
macron accent ? ¯ ¯
degree sign ? ° ° $^o$
plus or minus ? ± ± $\pm$
superscript two ? ² ² $^2$
superscript three ? ³ ³ $^3$
acute accent ? ´ ´ \'{}
micro sign ? µ µ $\mu$
paragraph sign ? ¶ ¶ \P
middle dot ? · · $\cdot$
cedilla ? ¸ ¸ \c{}
superscript one ? ¹ ¹ $^1$
masculine ordinal ? º º $^o$
right angle quote, guillemotright ? » »
\guillemotright
fraction one-fourth ? ¼ ¼ $\frac14$
fraction one-half ? ½ ½ $\frac12$
fraction three-fourths ? ¾ ¾ $\frac34$
inverted question mark ? ¿ ¿ ?'
capital A, grave accent ? À À \`A
capital A, acute accent ? Á Á \'A
capital A, circumflex accent ? Â Â \^A
capital A, tilde ? Ã Ã \~A
capital A, dieresis or umlaut mark ? Ä Ä \"A
capital A, ring ? Å Å \AA
capital AE diphthong (ligature) ? Æ Æ \AE
--
Michael Friendly Email: friendly at yorku.ca
Professor, Psychology Dept.
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street http://datavis.ca
Toronto, ONT M3J 1P3 CANADA
-------------- next part --------------
Description Char Code HTML TeX
double quote " " "
ampersand & & &
\&
apostrophe ' ' '
less than < < <
$<$
greater than > > >
$>$
non-breaking space .   ~
inverted exclamation ? ¡ ¡ !'
cent sign ? ¢ ¢
pound sterling ? £ £ \pounds
general currency sign ? ¤ ¤
yen sign ? ¥ ¥
broken vertical bar ? ¦ ¦
section sign ? § § \S
umlaut (dieresis) ? ¨ ¨
\"{}
copyright ? © ©
\copyright
feminine ordinal ? ª ª $^a$
left angle quote, guillemotleft ? « «
\guillemotleft
not sign ? ¬ ¬
soft hyphen ? ­ ­
registered trademark ? ® ®
\textregistered
macron accent ? ¯ ¯
degree sign ? ° ° $^o$
plus or minus ? ± ± $\pm$
superscript two ? ² ² $^2$
superscript three ? ³ ³ $^3$
acute accent ? ´ ´ \'{}
micro sign ? µ µ $\mu$
paragraph sign ? ¶ ¶ \P
middle dot ? · · $\cdot$
cedilla ? ¸ ¸ \c{}
superscript one ? ¹ ¹ $^1$
masculine ordinal ? º º $^o$
right angle quote, guillemotright ? » »
\guillemotright
fraction one-fourth ? ¼ ¼
$\frac14$
fraction one-half ? ½ ½
$\frac12$
fraction three-fourths ? ¾ ¾
$\frac34$
inverted question mark ? ¿ ¿ ?'
capital A, grave accent ? À À \`A
capital A, acute accent ? Á Á \'A
capital A, circumflex accent ? Â Â \^A
capital A, tilde ? Ã Ã \~A
capital A, dieresis or umlaut mark ? Ä Ä \"A
capital A, ring ? Å Å \AA
capital AE diphthong (ligature) ? Æ Æ \AE
capital C, cedilla ? Ç Ç \c{C}
capital E, grave accent ? È È \`E
capital E, acute accent ? É É \'E
capital E, circumflex accent ? Ê Ê \^E
capital E, dieresis or umlaut mark ? Ë Ë \"E
capital I, grave accent ? Ì Ì \`I
capital I, acute accent ? Í Í \'I
capital I, circumflex accent ? Î Î \^I
capital I, dieresis or umlaut mark ? Ï Ï \"I
capital Eth, Icelandic ? Ð Ð
capital N, tilde ? Ñ Ñ \~N
capital O, grave accent ? Ò Ò \`O
capital O, acute accent ? Ó Ó \'O
capital O, circumflex accent ? Ô Ô \^O
capital O, tilde ? Õ Õ \~O
capital O, dieresis or umlaut mark ? Ö Ö \"O
multiply sign ? × × $\times$
capital O, slash ? Ø Ø {\O}
capital U, grave accent ? Ù Ù \`U
capital U, acute accent ? Ú Ú \'U
capital U, circumflex accent ? Û Û \^U
capital U, dieresis or umlaut mark ? Ü Ü \"A
capital Y, acute accent ? Ý Ý \'Y
capital THORN, Icelandic ? Þ Þ \TH
small sharp s, German (sz ligature) ? ß ß \ss
small a, grave accent ? à à \`a
small a, acute accent ? á á \'a
small a, circumflex accent ? â â \^a
small a, tilde ? ã ã \~a
small a, dieresis or umlaut mark ? ä ä \"a
small a, ring ? å å \aa
small ae diphthong (ligature) ? æ æ \ae
small c, cedilla ? ç ç \c{c}
small e, grave accent ? è è \`e
small e, acute accent ? é é \'e
small e, circumflex accent ? ê ê \^e
small e, dieresis or umlaut mark ? ë ë \"e
small i, grave accent ? ì ì \`i
small i, acute accent ? í í \'i
small i, circumflex accent ? î î \^i
small i, dieresis or umlaut mark ? ï ï \"i
small eth, Icelandic ? ð ð
small n, tilde ? ñ ñ \~n
small o, grave accent ? ò ò \`o
small o, acute accent ? ó ó \'o
small o, circumflex accent ? ô ô \^o
small o, tilde ? õ õ \~o
small o, dieresis or umlaut mark ? ö ö \"o
division sign ? ÷ ÷
$\divide$
small o, slash ? ø ø {\o}
small u, grave accent ? ù ù \`u
small u, acute accent ? ú ú \'u
small u, circumflex accent ? û û \^u
small u, dieresis or umlaut mark ? ü ü \"u
small y, acute accent ? ý ý \'y
small thorn, Icelandic ? þ þ \th
small y, dieresis or umlaut mark ? ÿ ÿ \"y