Files
Mojca Miklavec abae7cb019 minor changes
2008-06-13 04:47:36 +02:00

563 lines
9.7 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
remember:
set lccode for - and '
TODO checklist:
- no comments were left out during conversion
- same set of patterns
- no commands and \'-like sequences left
already converted:
# language original_file code supported_encoding(s)
croatian hrhyph.tex hr[_HR] ec
========
čćđšž
^^a3 -> č ccaron
^^a2 -> ć cacute
^^9e -> đ dcroat
^^b2 -> š scaron
^^ba -> ž zcaron
serbian sh
=======
latin shhyphl.tex sr-latn ec
-----
\'c -> ć cacute
\v c -> č ccaron
\v s -> š scaron
\v z -> ž zcaron
^^a3 -> č ccaron
^^a2 -> ć cacute
^^9e -> đ dcroat
^^b2 -> š scaron
^^ba -> ž zcaron
cyrillic srhyphc.tex sr-cyrl t2a
--------
convert from iso-8859-5
dutch nehyph96.tex nl[_NL] ec
=====
^^e4 -> ä adieresis
^^e7 -> ç ccedilla
^^e8 -> è egrave
^^e9 -> é eacute
^^ea -> ê ecircumflex
^^eb -> ë edieresis
^^ef -> ï idieresis
^^ee -> î icircumflex
^^f1 -> ñ ntilde
^^f6 -> ö odieresis
^^fc -> ü udieresis
^^fb -> û ucircumflex
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
finnish fihyph.tex fi[_FI] ec
=======
åäö
^^e4 -> ä adieresis
^^f6 -> ö odieresis
apparently there is no å in patterns?
(šž may accur in foreign words)
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
italian ithyph.tex it[_IT] ascii/none
========
only ' is treated as a letter, patterns don't include any other accented character
how does that interact with mapping=tex-text?
polish plhyph.tex pl[_PL] qx
======
ąćęłńóśźż
/a -> ą aogonek
/c -> ć cacute
/e -> ę eogonek
/l -> ł lslash
/n -> ń nacute
/o -> ó oacute
/s -> ś sacute
/x -> ź zacute
/z -> ż zdotaccent
portuguese pthyph.tex pt ec
==========
^^e0 -> à - agrave (not used)
^^e1 -> á - aacute
^^e2 -> â - acircumflex
^^e3 -> ã - atilde
^^e7 -> ç - ccedilla
^^e8 -> è - egrave (not used)
^^e9 -> é - eacute
^^ea -> ê - ecircumflex
^^ed -> í - iacute
^^ee -> î - icircumflex
^^ef -> ï - idieresis (not used)
^^f3 -> ó - oacute
^^f4 -> ô - ocircumflex
^^f5 -> õ - otilde
^^f6 -> ö - odieresis (not used)
^^fa -> ú - uacute
^^fb -> û - ucircumflex (not used)
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
slovenian (slovene) sihyph.tex sl[-si] ec
=========
čšž
"c -> č ccaron
"s -> š scaron
"z -> ž zcaron
swedish svhyph.tex sv[-se] ec
=======
åäö
^^e5 -> å aring
^^e4 -> ä adieresis
^^f6 -> ö odieresis
àé are considered a variant of the same letter
^^e9 -> é eacute
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
"ukenglish" ukhyphen.tex en-gb ascii/none
===========
spanish (espanol) eshyph.tex es ec
=======
áéíóúüñ
convert from latin1 to utf-8
^^f1 ntilde
^^e1 aacute
^^e9 eacute
^^ed iacute
^^f3 oacute
^^fa uacute
^^fc udieresis
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
catalan cahyph.tex ca ec
=======
^^e0 -> à agrave
^^e7 -> ç - ccedilla
^^e8 -> è - egrave
^^e9 -> é - eacute
^^ed -> í - iacute
^^ef -> ï - idieresis
^^f2 -> ò - ograve
^^f3 -> ó - oacute
^^fa -> ú - uacute
^^fc -> ü - udieresis
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
galician glhyph.tex gl ec
========
áéíóú ñ üï
source in latin1 -> convert to utf-8
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
uppersorbian sorhyph.tex hsb ec
============
^^a3 -> č - ccaron
^^a2 -> ć - cacute
^^a5 -> ě - ecaron
^^aa -> ł - lslash
^^ab -> ń - nacute
^^f3 -> ó - oacute
^^b0 -> ř - rcaron
^^b2 -> š - scaron
^^ba -> ž - zcaron
^^b9 -> ź - zacute
welsh cyhyph.tex cy ec
=====
^^e1 -> á - aacute
^^e2 -> â - acircumflex
^^ea -> ê - ecircumflex
^^f4 -> ô - ocircumflex
^^eb -> ë - edieresis
^^ef -> ï - idieresis
^^f6 -> ö - odieresis
irish gahyph.tex ga ec
=====
^^e1 -> á - aacute
^^e9 -> é - eacute
^^ed -> í - iacute
^^f3 -> ó - oacute
^^fa -> ú - uacute
pinyin pyhyph.tex zh-latn ec
======
^^fc -> ü - udieresis
patterns can be loaded as ec only, and the same patterns are usable under texnansi as well (same position of letters)
interlingua iahyphen.tex ia ascii
===========
pure copy
romanian rohyphen.tex ro ec
========
ăâîșț ĂÂÎȘȚ
"a -> ă
"A -> â
"i -> î
"s -> ș
"t -> ț
"a = \u{a}
"A = \^{a}
"i = \^{\i}
"s = \c{s}
"t = \c{t}
% [-] \u{A} [not encoded]
% "A = \^{a} [-] \^{A} [not encoded]
% "i = \^{\i} "I = \^{I}
% "s = \c{s} "S = \c{S}
% "t = \c{t} "T = \c{T}
"a -> ^^a0
"A -> ^^e2
"i -> ^^ee
"s -> ^^b3
"t -> ^^b5
estonian ethyph.tex et ec
========
šžäöüõ
^^b2 -> š - scaron
^^ba -> ž - zcaron
^^e4 -> ä - adieresis
^^f6 -> ö - odieresis
^^fc -> ü - udieresis
^^f5 -> õ - otilde
hungarian huhyphn.tex hu ec
=========
saved in ec encoding (that no editor can read)
^^e1
^^e9
^^f3
^^f6
^^ae
^^fc
^^fa
^^b6
^^ed
^^e4
\lccode"E1="E1 % á - aacute
\lccode"E9="E9 % é - eacute
\lccode"ED="ED % í - iacute
\lccode"F3="F3 % ó - oacute
\lccode"FA="FA % ú - uacute
\lccode"AE="AE % ő - ohungarumlaut
\lccode"B6="B6 % ű - uhungarumlaut
\lccode"E4="E4 % ä - adieresis
\lccode"F6="F6 % ö - odieresis
\lccode"FC="FC % ü - udieresis
icelandic icehyph.tex is ec
=========
^^e1 á - aacute
^^e9 é - eacute
^^ed í - iacute
^^f3 ó - oacute
^^fa ú - uacute
^^fd ý - yacute
^^fe þ - thorn
^^e6 æ - ae
^^f6 ö - odieresis
^^f0 ð - eth
turkish tkhyph.tex tr ec
=======
weird conversion, see additional notes
^^11 -> ı - dotlessi % error
^^e2 -> â - acircumflex
^^ee -> î - icircumflex
^^f4 -> ô - ocircumflex
^^f6 -> ö - odieresis
^^fc -> ü - udieresis
^^e7 -> ç - ccedilla
^^a7 -> ğ - gbreve
^^f1 -> ñ - ntilde
^^b3 -> ş - scedilla
---------------------
czech - not to be included this year ? - works fine with ec, but probably includes other tricks as well
=====
\v e -> ě ecaron
\v c -> č ccaron
\v d -> ď dcaron
\v l -> ľ lcaron (not used)
\v n -> ň ncaron
\v r -> ř rcaron
\v s -> š scaron
\v t -> ť tcaron
\v z -> ž zcaron
\r u -> ů uring
\'a -> á aacute
\'e -> é eacute
\'i -> í iacute
\'o -> ó oacute
\'u -> ú uacute
\'r -> ŕ racute (not used)
\'y -> ý yacute
\"a -> ä adieresis (not used)
\^o -> ô ocircumflex (not used)
slovak
======
\v e -> ě ecaron (not used)
\v c -> č ccaron
\v d -> ď dcaron
\v l -> ľ lcaron
\v n -> ň ncaron
\v r -> ř rcaron (not used)
\v s -> š scaron
\v t -> ť tcaron
\v z -> ž zcaron
\r u -> ů uring (not used)
\'a -> á aacute
\'e -> é eacute
\'i -> í iacute
\'o -> ó oacute
\'u -> ú uacute
\'r -> ŕ racute
\'y -> ý yacute
\"a -> ä adieresis
\^o -> ô ocircumflex
csaccents that Jonathan has written have some more characters available:
\"o
\"u
\'l
\`a
german - not to be included this year
======
ec, also texnansi; original "supports" ot1 as well
äöü are positioned at equal places in both ec & texnansi, while ß is at a different place in ec/texnansi/OT1
0x19 0xDF 0xFF
ec ı SS ß
texnansi ß ß ÿ
OT1 ß - -
äöüß
"a -> ä
"o -> ö
"u -> ü
/3 -> ß
/9 duplicated entry for ß, only inside \c
\n - keep pattern
\c - delete pattern (duplicated)
german - old dehypht.tex
german - new (ngerman) dehyphn.tex
hyphenation exceptions apparently loaded separately
latin
=====
\ae -> æ
\oe -> œ
\n - delete pattern (duplicated)
french frhyph fr[_FR] ec - not to be included this year
======
=patois
=francais
also kind-of-supports \oe in ot1, TODO: check for texnansi
æ - ae (*unused*)
œ - oe
\`a à e0
\`e è e8
\`u ù f9 (*unused*)
\'e é e9
\^a â e2
\^e ê ea
\^i î ee
\^o ô f4
\^u û fb
\"e ë eb (*unused*)
\"i ï ef
\"u ü fc (*unused*)
\"y ÿ b8 (*unused*)
\cc ç e7 ccedilla
\oe œ f7 oe
\n - remove pattern (only duplicated \oe inside)
remove 0 from \oe0
% For \oe which exists in T1 _and_ OT1 encoded fonts but with
% different glyph codes, patterns for both glyphs are included.
% Thus you can use either T1 encoded fonts, or OT1 encoded fonts
% and MLTeX's character substitution definition.
danish
======
the same problem
danish dkhyph.tex
dkcommon.tex
dkhyph.tex
dkspecial.tex
X -> æ
Y -> ø
Z -> å
esperanto latin3
=========
^c -> ĉ (E6)
^g -> ĝ (F8)
^h -> ĥ (B6)
^j -> ĵ (BC)
^s -> ŝ (FE)
^u -> ŭ (FD)
^C -> Ĉ (C6)
^G -> Ĝ (D8)
^H -> Ĥ (A6)
^J -> Ĵ (AC)
^S -> Ŝ (DE)
^U -> Ŭ (DD)
special converter needed (and already done):
coptic xu-copthyph.tex utf8-copthyph.tex copthyph.tex
english hyphen.tex % do not change!
=usenglish/USenglish/american
%
% ushyphmax.tex, on the other hand, includes Gerard Kuiken's additional
% patterns; it is not frozen.
usenglishmax ushyphmax.tex
TODO (delete all the three entries once patterns are converted):
norsk xu-nohyphbx.tex
=norwegian
nynorsk nnhyph.tex
bokmal nbhyph.tex
indonesian inhyph.tex
welsh cyhyph.tex
greek variants:
- ibycus ibyhyph.tex
- greek xu-grphyph4.tex
=polygreek
- monogreek xu-grmhyph4.tex
- ancientgreek xu-grahyph4.tex
cyrilic variants:
- bulgarian xu-bghyphen.tex
- mongolian xu-mnhyph.tex
- mongolian2a mnhyphn.tex
- serbian xu-srhyphc.tex
- russian xu-ruhyphen.tex
- ukrainian xu-ukrhyph.tex
TODO:
waiting list:
czech
slovak
german xu-dehypht.tex
ngerman xu-dehyphn.tex
latin xu-lahyph.tex
esperanto xu-eohyph.tex
consult the authors:
basque xu-bahyph.tex
bahyph.sh
bahyph.tex
galician xu-glhyph.tex
turkish xu-tkhyph.tex
xu-cp866nav.tex
xu-dehyphtex.tex
xu-eohyph.tex
xu-nohyphbx.tex
xu-ruhyphen.tex
xu-ukrhyph.tex
cyhyph.tex
eohyph.tex
grahyph4.tex
grmhyph4.tex
grphyph4.tex
hyphen.tex
hypht1.tex
ibyhyph.tex
inhyph.tex
ushyphmax.tex