The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.[2]
Details
In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.
Character set
| Cork encoding | ||||||||||||||||
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
| 0x | ` 0060  | 
´ 00B4  | 
ˆ 02C6  | 
˜ 02DC  | 
¨ 00A8  | 
˝ 02DD  | 
˚ 02DA  | 
ˇ 02C7  | 
˘ 02D8  | 
¯ 00AF  | 
˙ 02D9  | 
¸ 00B8  | 
˛ 02DB  | 
‚ 201A  | 
‹ 2039  | 
› 203A  | 
| 1x | “ 201C  | 
” 201D  | 
„ 201E  | 
« 00AB  | 
» 00BB  | 
– 2013  | 
— 2014  | 
ZWSP | ₀[lower-alpha 1] 2080  | 
ı[lower-alpha 2] 0131  | 
ȷ[lower-alpha 2] 0237  | 
ff FB00  | 
fi FB01  | 
fl FB02  | 
ffi FB03  | 
ffl FB04  | 
| 2x | SP | ! | " | # | $ | % | & | ’ 2019  | 
( | ) | * | + | , | - | . | / | 
| 3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? | 
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | 
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ | 
| 6x | ‘ 2018  | 
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | 
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | SHY[lower-alpha 3] | 
| 8x | Ă 0102  | 
Ą 0104  | 
Ć 0106  | 
Č 010C  | 
Ď 010E  | 
Ě 011A  | 
Ę 0118  | 
Ğ 011E  | 
Ĺ 0139  | 
Ľ 013D  | 
Ł 0141  | 
Ń 0143  | 
Ň 0147  | 
Ŋ 014A  | 
Ő 0150  | 
Ŕ 0154  | 
| 9x | Ř 0158  | 
Ś 015A  | 
Š 0160  | 
Ș 0218  | 
Ť 0164  | 
Ț 021A  | 
Ű 0170  | 
Ů 016E  | 
Ÿ 0178  | 
Ź 0179  | 
Ž 017D  | 
Ż 017B  | 
IJ 0132  | 
İ 0130  | 
đ 0111  | 
§ 00A7  | 
| Ax | ă 0103  | 
ą 0105  | 
ć 0107  | 
č 010D  | 
ď 010F  | 
ě 011B  | 
ę 0119  | 
ğ 011F  | 
ĺ 013A  | 
ľ 013E  | 
ł 0142  | 
ń 0144  | 
ň 0148  | 
ŋ 014B  | 
ő 0151  | 
ŕ 0155  | 
| Bx | ř 0159  | 
ś 015B  | 
š 0161  | 
ș 0219  | 
ť 0165  | 
ț 021B  | 
ű 0171  | 
ů 016F  | 
ÿ 00FF  | 
ź 017A  | 
ž 017E  | 
ż 017C  | 
ij 0133  | 
¡ 00A1  | 
¿ 00BF  | 
£ 00A3  | 
| Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï | 
| Dx | Ð[lower-alpha 4] | Ñ | Ò | Ó | Ô | Õ | Ö | Œ 0152  | 
Ø | Ù | Ú | Û | Ü | Ý | Þ | SS[lower-alpha 5] 1E9E  | 
| Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï | 
| Fx | ð | ñ | ò | ó | ô | õ | ö | œ 0153  | 
ø | ù | ú | û | ü | ý | þ | ß 00DF  | 
Notes
- Hexadecimal values under the characters in the table are the Unicode character codes.
 - The first 12 characters are often used as combining characters.
 
- ↑ 0x18 is just a "trailing zero", used to compose ‰ or ‱ (or arbitrary smaller quantities) out of percent sign (%).
 - 1 2 Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
 - ↑ 0x7F is the hyphenation character (not really a soft hyphen).
 - ↑ 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
 - ↑ 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.
 
Supported languages
The encoding supports most European languages written in Latin alphabet. Notable exceptions are:
- Esperanto (using IL3)
 - Latvian language and Lithuanian language (using L7X)
 - Welsh language
 
Languages with slightly suboptimal support include:
- Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
 - Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
 - Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages
 
References
- 1 2 Petrlik, Lukas (1996-06-19). "The Czech and Slovak Character Encoding Mess Explained". cs-encodings-faq. 1.10. Archived from the original on 2016-06-21. Retrieved 2016-06-21.
 - ↑ Ferguson, Michael (1990), "Report on Multilingual Activities" (PDF), TUGboat, 11 (4): 514–516
 - ↑ TeX hyphenation patterns