File locations: All schemas are in the schemas/ directory relative to this page. Click any card to download the file directly. Composite index schemas reference their constituent language schemas by relative URI — keep all files together in the same schemas/ directory.

Latin Script

Individual language schemas for the 24 official EU languages plus Norwegian Bokmål, Norwegian Nynorsk, and Icelandic. A composite index schema covers all EU languages together. (Bulgarian is covered in the Cyrillic section; Greek in its own section — both are included in the EU index by reference.)

eu-languages-index.crepdl Composite index: union of all 24 official EU language repertoires as defined under EU Regulation No 1 of 1958 (as amended). References 24 individual language schemas by URI. schemas/eu-languages-index.crepdl
English (en)

Standard 26-letter Latin alphabet, ASCII printable range, plus common typographic punctuation.

schemas/en-english.crepdl
German (de)

Latin with Umlaut letters Ä, Ö, Ü and the Eszett ß.

schemas/de-german.crepdl
French (fr)

Latin with acute, grave, circumflex, diaeresis, cedilla, and ligatures œ and æ.

schemas/fr-french.crepdl
Spanish (es)

Latin with inverted punctuation ¡ ¿, tilde Ñ, and accented vowels.

schemas/es-spanish.crepdl
Portuguese (pt)

Latin with acute, grave, circumflex, tilde, cedilla: ã, â, ç, ê, õ, ô and others.

schemas/pt-portuguese.crepdl
Italian (it)

Latin with grave and acute accents on vowels: à, è, é, ì, ò, ó, ù.

schemas/it-italian.crepdl
Dutch (nl)

Latin with diaeresis on vowels ë, ï, ü, ä, ö and acute/grave/circumflex forms.

schemas/nl-dutch.crepdl
Polish (pl)

Latin with ogonek ą ę, acute ć ń ó ś ź, overdot ż, and stroke ł.

schemas/pl-polish.crepdl
Romanian (ro)

Latin with comma-below letters ș ț and circumflex/breve â î ă.

schemas/ro-romanian.crepdl
Czech (cs)

Latin with háček and čárka diacritics: á, č, ď, é, ě, í, ň, ó, ř, š, ť, ú, ů, ý, ž.

schemas/cs-czech.crepdl
Hungarian (hu)

Latin with unique double acute accent vowels ő Ő ű Ű, plus á, é, í, ó, ö, ü.

schemas/hu-hungarian.crepdl
Swedish (sv)

Latin with three extra vowels beyond English: å, ä, ö.

schemas/sv-swedish.crepdl
Danish (da)

Latin with three extra letters: Æ/æ, Ø/ø, Å/å.

schemas/da-danish.crepdl
Finnish (fi)

Latin with ä and ö (and å in Swedish loanwords).

schemas/fi-finnish.crepdl
Slovak (sk)

Latin with caron č š ž ď ľ ň ť, acute á é í ó ú ý, and unique ŕ ĺ.

schemas/sk-slovak.crepdl
Croatian (hr)

Latin with diacritical characters č, ć, dž, đ, lj, nj, š, ž.

schemas/hr-croatian.crepdl
Slovenian (sl)

Latin with three caron letters: č, š, ž.

schemas/sl-slovenian.crepdl
Lithuanian (lt)

Latin with ogonek ą ę į ų, macron ū, caron č š ž, and superscript dot ė.

schemas/lt-lithuanian.crepdl
Latvian (lv)

Latin with macrons ā ē ī ū, cedillas ģ ķ ļ ņ ŗ, and caron č š ž.

schemas/lv-latvian.crepdl
Estonian (et)

Latin with additional letters ä, ö, õ, ü, š, ž.

schemas/et-estonian.crepdl
Irish (ga)

Latin with síneadh fada (acute accent) on vowels: á, é, í, ó, ú.

schemas/ga-irish.crepdl
Maltese (mt)

Latin with unique characters ħ (H-bar), għ (digraph), ċ (C-dot), and ġ (G-dot).

schemas/mt-maltese.crepdl
Norwegian Bokmål (nb)

Latin 29-letter Norwegian alphabet including Æ/æ, Ø/ø, Å/å. The dominant written standard (~85–90% of Norwegian writing).

schemas/nb-norwegian_bokmål.crepdl
Norwegian Nynorsk (nn)

Latin 29-letter Norwegian alphabet, identical repertoire to Bokmål. The second official Norwegian written standard (~10–15% of written use).

schemas/nn-norwegian_nynorsk.crepdl
Icelandic (is)

Latin 32-letter Icelandic alphabet with unique letters Ð/ð (Eth) and Þ/þ (Thorn), preserved from Old Norse. ~370,000 native speakers.

schemas/is-icelandic.crepdl

Greek Script

Modern Greek with monotonic orthography. Also referenced in the EU languages composite index.

Cyrillic Script

Schemas for Cyrillic-script languages across Eastern Europe and Central Asia. Bulgarian is also referenced in the EU index. Several languages (Uzbek, Azerbaijani, Kazakh, Tajik, Turkmen, Kyrgyz) use Cyrillic alongside Latin or Arabic — their schemas cover all scripts in use.

Arabic / Perso-Arabic Script

Right-to-left schemas for Arabic and languages that use Arabic-derived scripts: Persian, Pashto, Kurdish (Soranî), Urdu, Sindhi, Kashmiri, Uyghur, Hausa (Ajami), and West Punjabi (Shahmukhi). Several multilingual schemas (Kazakh, Uzbek, Malay, Kurdish) also include Arabic alongside other scripts.

Arabic (ar)
RTL

Arabic block (U+0600–06FF), Supplement, Extended-A/B, and Presentation Forms A/B. Covers Modern Standard Arabic including tashkeel vowel marks. ~422M speakers across 22+ countries.

schemas/ar-arabic.crepdl
Persian / Farsi (fa)
RTL

Perso-Arabic (Nastaliq style). Adds پ چ ژ گ and Eastern Arabic-Indic digits ۰–۹ to the Arabic base. ~80M speakers in Iran, Afghanistan (Dari), and Tajikistan.

schemas/fa-persian.crepdl
Pashto (ps)
RTL

Perso-Arabic with additional letters specific to Pashto phonology. ~45M speakers in Afghanistan and Pakistan.

schemas/ps-pashto.crepdl
Kurdish (ku)
RTL (Soranî)

Latin (Kurmanji/Badini) plus Perso-Arabic (Soranî). ~26M speakers across Turkey, Iraq, Iran, and Syria.

schemas/ku-kurdish.crepdl
Urdu (ur)
RTL

Perso-Arabic (Nastaliq style). National language of Pakistan; scheduled language of India.

schemas/ur-urdu.crepdl
West Punjabi / Shahmukhi (pnb)
RTL

Shahmukhi (Arabic-script Punjabi) as used in Pakistan Punjab. ~90M total speakers.

schemas/pnb-west_punjabi_shahmukhi.crepdl
Uyghur (ug)
RTL (primary)

Perso-Arabic (official in China) plus Latin and Cyrillic used in the diaspora. ~15M speakers.

schemas/ug-uyghur.crepdl
Hausa (ha)

Latin (Boko, standard) plus Arabic (Ajami, traditional). ~75M speakers in West Africa.

schemas/ha-hausa.crepdl
Saraiki / Siraiki (skr)
RTL

Perso-Arabic. ~20M speakers in southern Punjab, Pakistan.

schemas/skr-saraiki_siraiki.crepdl
South Azerbaijani (azb)
RTL

Perso-Arabic, the primary script for Azerbaijani as spoken in Iran. ~14M speakers.

schemas/azb-south_azerbaijani_southern_azerbaijani.crepdl

CJK — Chinese, Japanese, Korean

CJK schemas are of particular value given that UTF-8 exposes nearly 88,000 CJK ideographic characters alone. All five main CJK schemas share a common foundation of CJK Unified Ideograph blocks (Extensions A–H) and add their language-specific scripts on top. Chinese dialect schemas cover Wu, Min Nan, Hakka, Jinyu, Xiang, and Gan.

cjk-languages-index.crepdl Composite index: union of Japanese, Simplified Chinese, Traditional Chinese, Cantonese, and Korean repertoires. schemas/cjk-languages-index.crepdl
Japanese (ja)

Hiragana (U+3040–309F), Katakana (U+30A0–30FF), Kanji (CJK Unified Ideographs + Extensions A–H), Latin rōmaji, Kana Supplement/Extended, and Hentaigana. ~125M speakers.

schemas/ja-japanese.crepdl
Chinese, Simplified (zh-hans)

Simplified Hanzi (CJK Unified Ideographs), Bopomofo, Pinyin (Latin), and CJK Extensions A–H. Standard written form in mainland China.

schemas/zh-hans-chinese-simplified.crepdl
Chinese, Traditional (zh-hant)

Traditional Hanzi, Bopomofo, Jyutping/Yale Latin, and CJK Extensions. Standard written form in Taiwan and Hong Kong.

schemas/zh-hant-chinese-traditional.crepdl
Cantonese (yue)

Traditional Hanzi (Cantonese usage) plus Jyutping/Yale romanisation. ~85M speakers in Guangdong, Hong Kong, and the diaspora.

schemas/yue-cantonese.crepdl
Korean (ko)

Hangul syllable block (U+AC00–D7A3), Hangul Jamo, Hanja (CJK Ideographs), and Latin. ~82M speakers in South and North Korea.

schemas/ko-korean.crepdl
Wu Chinese / Shanghainese (wuu)

Traditional Hanzi (Wu-specific usage), CJK Extensions, plus Latin romanisation. ~74M native speakers.

schemas/wuu-wu_chinese_shanghainese.crepdl
Min Nan / Hokkien / Taiwanese (nan)

Traditional Hanzi, Tai-lo/POJ Latin romanisation, and Min Nan-specific characters. ~75M total speakers.

schemas/nan-min_nan_hokkien_-_taiwanese.crepdl
Hakka Chinese / 客家話 (hak)

Traditional Hanzi (Hakka usage), CJK Extensions, plus Latin romanisation. ~47M native speakers.

schemas/hak-hakka_chinese_客家話.crepdl
Jinyu Chinese / 晉語 (cjy)

Simplified Hanzi (Jinyu usage) and CJK Extensions. ~46M native speakers in Shanxi and adjacent areas.

schemas/cjy-jinyu_chinese_晉語.crepdl
Xiang Chinese / 湘語 (hsn)

Simplified Hanzi (Xiang/Hunanese usage) and CJK Extensions. ~36M native speakers in Hunan.

schemas/hsn-xiang_chinese_湘語.crepdl
Gan Chinese / 贛語 (gan)

Simplified Hanzi (Gan usage) and CJK Extensions. ~22M native speakers in Jiangxi.

schemas/gan-gan_chinese_贛語.crepdl
Zhuang / Cuengh (za)

Latin (standard Zhuang orthography) plus CJK characters (Sawndip traditional script). ~16M native speakers in Guangxi.

schemas/za-zhuang_cuengh.crepdl

Indic Scripts

The 22 constitutionally scheduled languages of India plus Nepali, Sinhala, and Sylheti. Scripts covered include Devanagari, Bengali/Assamese, Gurmukhi, Gujarati, Odia, Tamil, Telugu, Kannada, Malayalam, Meetei Mayek, and Ol Chiki. A composite index covers all 22 scheduled Indian languages.

india-languages-index.crepdl Composite index: union of all 22 languages recognised under the Eighth Schedule to the Constitution of India, spanning 10 scripts. schemas/india-languages-index.crepdl
Hindi (hi)

Devanagari abugida plus Devanagari Extended (Vedic accent marks). ~600M L1+L2 speakers; Union official language of India.

schemas/hi-hindi.crepdl
Marathi (mr)

Devanagari plus Modi script (historical). Official language of Maharashtra and Goa.

schemas/mr-marathi.crepdl
Nepali (ne)

Devanagari. Official language of Nepal; scheduled language of Sikkim.

schemas/ne-nepali.crepdl
Bengali (bn)

Bengali/Bangla script (U+0980–09FF). Official language of Bangladesh; scheduled language of West Bengal and Tripura. ~230M speakers.

schemas/bn-bengali.crepdl
Assamese (as)

Bengali script (Assamese variant). Official language of Assam, with distinct letterforms from Bengali.

schemas/as-assamese.crepdl
Punjabi (pa)

Gurmukhi script (U+0A00–0A7F). Official script of Punjabi in India. ~125M total speakers.

schemas/pa-punjabi.crepdl
Gujarati (gu)

Gujarati script (U+0A80–0AFF). Official language of Gujarat. ~60M speakers.

schemas/gu-gujarati.crepdl
Odia (or)

Odia/Oriya script (U+0B00–0B7F). Official language of Odisha.

schemas/or-odia.crepdl
Tamil (ta)

Tamil script (U+0B80–0BFF). Official language of Tamil Nadu and Puducherry; one of the world's oldest classical languages. ~80M speakers.

schemas/ta-tamil.crepdl
Telugu (te)

Telugu script (U+0C00–0C7F). Official language of Andhra Pradesh and Telangana. ~95M speakers.

schemas/te-telugu.crepdl
Kannada (kn)

Kannada script (U+0C80–0CFF). Official language of Karnataka. ~60M speakers.

schemas/kn-kannada.crepdl
Malayalam (ml)

Malayalam script (U+0D00–0D7F). Official language of Kerala and Lakshadweep. ~38M speakers.

schemas/ml-malayalam.crepdl
Maithili (mai)

Devanagari plus Tirhuta script. Spoken in Bihar and Jharkhand.

schemas/mai-maithili.crepdl
Sanskrit (sa)

Devanagari plus Sharada script. Classical language with pan-India scholarly and religious use.

schemas/sa-sanskrit.crepdl
Konkani (kok)

Devanagari plus Kannada, Malayalam, and Latin scripts. Official language of Goa.

schemas/kok-konkani.crepdl
Sindhi (sd)

Arabic/Nastaliq (primary) plus Devanagari. No home state in India.

schemas/sd-sindhi.crepdl
Kashmiri (ks)

Arabic (primary) plus Devanagari and Sharada (historical). Official language of Jammu & Kashmir.

schemas/ks-kashmiri.crepdl
Bodo (brx)

Devanagari. Scheduled language spoken in Bodoland, Assam.

schemas/brx-bodo.crepdl
Dogri (dgo)

Devanagari plus Takri (historical) script. Spoken in Jammu & Kashmir and Himachal Pradesh.

schemas/dgo-dogri.crepdl
Manipuri / Meitei (mni)

Meetei Mayek script plus Bengali. Official language of Manipur.

schemas/mni-manipuri.crepdl
Santali (sat)

Ol Chiki script plus Devanagari, Bengali, and Odia. Austroasiatic (Munda) language of Jharkhand.

schemas/sat-santali.crepdl
Chhattisgarhi (hne)

Devanagari. ~16M speakers in Chhattisgarh.

schemas/hne-chhattisgarhi.crepdl
Magahi (mag)

Devanagari. ~21M native speakers in Bihar and Jharkhand.

schemas/mag-magahi.crepdl
Bhojpuri (bho)

Devanagari plus Kaithi (historical script). ~52M speakers in Bihar, Uttar Pradesh, and the diaspora.

schemas/bho-bhojpuri.crepdl
Sylheti (syl)

Bengali script (Sylheti variant). ~12M native speakers in Bangladesh's Sylhet Division and northeast India.

schemas/syl-sylheti.crepdl
Sinhala (si)

Sinhala script (Brahmi-derived, U+0D80–0DFF). Official language of Sri Lanka. ~17M speakers.

schemas/si-sinhala.crepdl

Southeast Asian Scripts

National and major regional languages of Southeast Asia, covering Thai, Myanmar/Burmese, Khmer, Lao, and Latin-script languages. Regional scripts for Javanese (Hanacaraka), Balinese (Aksara Bali), Sundanese (Aksara Sunda), and Buginese (Lontara) are also included. A composite index covers the full set of 12 languages.

sea-languages-index.crepdl Composite index: union of 12 national and major regional Southeast Asian language repertoires, covering 5 distinct scripts plus Latin. schemas/sea-languages-index.crepdl
Thai (th)

Thai abugida script (U+0E00–0E7F), a Brahmi-derived script encoding vowels as diacritics with tone marks. ~70M speakers.

schemas/th-thai.crepdl
Northeastern Thai / Isan (tts)

Thai script (same repertoire as standard Thai). ~15M native speakers in the Isan region of northeast Thailand.

schemas/tts-northeastern_thai_isan_-_ภาษาอีสาน.crepdl
Burmese / Myanmar (my)

Myanmar script (U+1000–109F), a Brahmi-derived abugida. National language of Myanmar.

schemas/my-burmese.crepdl
Khmer (km)

Khmer script (U+1780–17FF), the largest Unicode alphabet. Official language of Cambodia.

schemas/km-khmer.crepdl
Lao (lo)

Lao script (U+0E80–0EFF), a Brahmi-derived abugida. Official language of Laos.

schemas/lo-lao.crepdl
Vietnamese (vi)

Latin-based Quốc Ngữ with extensive diacritics: five tone marks and modified vowels ă, â, ê, ô, ơ, ư, đ. ~90M speakers.

schemas/vi-vietnamese.crepdl
Indonesian (id)

Latin with minimal diacritics. ~270M L1+L2 speakers; national language of Indonesia.

schemas/id-indonesian.crepdl
Malay (ms)

Latin (Rumi, standard) plus Arabic (Jawi, traditional). National language of Malaysia, Brunei, and Singapore.

schemas/ms-malay.crepdl
Filipino / Tagalog (fil)

Latin plus Baybayin (traditional Philippine script). Official language of the Philippines.

schemas/fil-filipino-tagalog.crepdl
Javanese (jv)

Latin plus Hanacaraka (Javanese script, U+A980–A9DF). ~82M speakers in Java, Indonesia.

schemas/jv-javanese.crepdl
Balinese (ban)

Latin plus Aksara Bali (Balinese script, U+1B00–1B7F). Spoken in Bali, Indonesia.

schemas/ban-balinese.crepdl
Sundanese (su)

Latin plus Aksara Sunda (Sundanese script, U+1B80–1BBF). ~42M speakers in West Java, Indonesia.

schemas/su-sundanese.crepdl
Buginese (bug)

Latin plus Lontara (Buginese script, U+1A00–1A1F). Spoken in Sulawesi, Indonesia.

schemas/bug-buginese.crepdl
Cebuano (ceb)

Latin plus Baybayin (traditional Philippine script). ~27M speakers in the Visayas and Mindanao.

schemas/ceb-cebuano.crepdl
Hiligaynon / Ilonggo (hil)

Latin. ~10M native speakers in the Western Visayas, Philippines.

schemas/hil-hiligaynon_ilonggo.crepdl
Ilocano / Ilokano (ilo)

Latin. ~10M native speakers in the Ilocos Region and Cagayan Valley, Philippines.

schemas/ilo-ilocano_ilokano.crepdl

Other Scripts

Languages using distinct writing systems not covered in earlier groups: Hebrew, Ethiopic/Ge'ez (Amharic, Tigrinya, Oromo), Georgian, Armenian, Tibetan, and the many Latin-script languages of Africa and the Middle East. Three composite index schemas cover grouped subsets. Languages whose schemas span multiple scripts (Uzbek, Azerbaijani, Hausa, Wolof etc.) are listed under their primary script above but their schemas cover all scripts used.

remaining-languages-index.crepdl Composite index of 20 major world languages not covered by earlier index files: Arabic, Amharic, Hausa, Somali, Hebrew, Persian, Pashto, Kurdish, Russian, Ukrainian, Serbian, Turkish, Uzbek, Azerbaijani, Kazakh, Yoruba, Igbo, Swahili, Sinhala, and Georgian. schemas/remaining-languages-index.crepdl
set6-languages-index.crepdl Composite index of 20 languages with newly introduced scripts: Bhojpuri, Oromo, Fula (Adlam), Cebuano, Zulu, Malagasy, Xhosa, Uyghur, Afrikaans, Tigrinya, Tajik, Belarusian, Albanian, Armenian, Turkmen, Mongolian, Tibetan, Kyrgyz, Shona, and Bambara (N'Ko). schemas/set6-languages-index.crepdl
set7-languages-index.crepdl Composite index of 25 additional languages with 10M+ speakers: Lingala, West Punjabi, Wu Chinese, Min Nan, Hakka, Jinyu, Xiang, Gan, Zhuang, Northern Sotho, Sesotho, Setswana, Kinyarwanda, Chhattisgarhi, Magahi, Saraiki, Ilocano, Northeastern Thai, South Azerbaijani, Sylheti, Hiligaynon, Kirundi, Wolof, Nigerian Pidgin, and Cameroonian Pidgin. schemas/set7-languages-index.crepdl
Hebrew (he)
RTL

22-letter Hebrew consonantal alphabet with optional nikud (vowel points, U+05B0–05C7). ~22M speakers; official language of Israel.

schemas/he-hebrew.crepdl
Amharic (am)

Ethiopic / Ge'ez script (fidel syllabary, U+1200–137F). Official language of Ethiopia. ~57M speakers.

schemas/am-amharic.crepdl
Tigrinya (ti)

Ethiopic script (Ge'ez fidel). Official language of Eritrea; co-official in the Tigray region of Ethiopia. ~9M speakers.

schemas/ti-tigrinya.crepdl
Oromo / Afaan Oromo (om)

Qubee Latin (standard since 1991) plus Ethiopic script. ~42M speakers in Ethiopia and Kenya.

schemas/om-oromo_afaan_oromo.crepdl
Georgian (ka)

Mkhedruli script (U+10D0–10FF) plus Asomtavruli (U+10A0–10CF) and Georgian Extended (Mtavruli). One of the world's oldest and most distinctive alphabets. ~4M speakers.

schemas/ka-georgian.crepdl
Armenian (hy)

Armenian script / Aybuben (U+0530–058F), created ~405 CE by Mesrop Mashtots. 38-letter alphabet covering Classical (Grabar) and Modern Eastern/Western Armenian. ~7M speakers.

schemas/hy-armenian.crepdl
Tibetan (bo)

Uchen / Tibetan script (U+0F00–0FFF), a Brahmi-derived abugida created ~620 CE. Covers Standard and Classical Tibetan (Chöke). ~6M speakers.

schemas/bo-tibetan.crepdl
Turkish (tr)

Latin 29-letter alphabet with Ç, Ğ, İ, Ö, Ş, Ü (note also dotless ı). ~90M speakers in Turkey and Cyprus.

schemas/tr-turkish.crepdl
Azerbaijani (az)

Latin (official in Azerbaijan since 1991) plus Cyrillic and Arabic scripts. ~32M total speakers.

schemas/az-azerbaijani.crepdl
Uzbek (uz)

Latin (official since 1995) plus Cyrillic (still widely used) and Arabic (historical). ~35M speakers.

schemas/uz-uzbek.crepdl
Turkmen (tk)

Latin (official since 1993) plus Cyrillic (legacy) and Perso-Arabic (historical). ~7M speakers.

schemas/tk-turkmen.crepdl
Albanian (sq)

Latin 36-letter alphabet with Ë/ë and the digraph Rr/rr. ~8M speakers in Albania, Kosovo, and North Macedonia.

schemas/sq-albanian.crepdl
Somali (so)

Latin (standard official script since 1972). Official language of Somalia and Djibouti. ~22M speakers.

schemas/so-somali.crepdl
Swahili / Kiswahili (sw)

Standard 26-letter Latin alphabet. ~71M speakers; national/official language of Tanzania, Kenya, Uganda, and the DRC.

schemas/sw-swahili.crepdl
Yoruba (yo)

Latin with dot-below letters ẹ, ọ, ṣ and tone marks. ~47M speakers in Nigeria and Benin.

schemas/yo-yoruba.crepdl
Igbo (ig)

Latin with dot-below letters ị, ọ, ụ and tone marks. ~17M speakers in southeastern Nigeria.

schemas/ig-igbo.crepdl
Fula / Fulfulde (ff)

Latin (standard) plus Arabic (Ajami) plus Adlam (U+1E900–1E95F), an indigenous script created in the 1980s. ~35M speakers across West Africa.

schemas/ff-fula_fulfulde.crepdl
Bambara / Bamanankan (bm)

Latin plus N'Ko script (U+07C0–07FF, RTL). ~15M speakers; the lingua franca of Mali.

schemas/bm-bambara_bamanankan.crepdl
Afrikaans (af)

Latin with diacritics ê, ë, î, ï, ô, û and the unique â. ~17M speakers; official language of South Africa.

schemas/af-afrikaans.crepdl
Zulu / isiZulu (zu)

Standard 26-letter Latin alphabet. ~28M speakers; official language of South Africa.

schemas/zu-zulu_isizulu.crepdl
Xhosa / isiXhosa (xh)

Standard 26-letter Latin alphabet. ~19M speakers; official language of South Africa.

schemas/xh-xhosa_isixhosa.crepdl
Shona (sn)

Latin. ~15M speakers; major language of Zimbabwe.

schemas/sn-shona.crepdl
Northern Sotho / Sepedi (nso)

Latin. ~14M total speakers; official language of South Africa.

schemas/nso-northern_sotho_sesotho_sa_leboa_-_sepedi.crepdl
Sesotho / Southern Sotho (st)

Latin. ~14M total speakers; official language of South Africa and Lesotho.

schemas/st-sesotho_southern_sotho.crepdl
Setswana / Tswana (tn)

Latin. ~14M total speakers; official language of South Africa and Botswana.

schemas/tn-setswana_tswana.crepdl
Kinyarwanda (rw)

Latin. ~12M total speakers; official language of Rwanda.

schemas/rw-kinyarwanda.crepdl
Kirundi / Rundi (rn)

Latin. ~12M total speakers; official language of Burundi.

schemas/rn-kirundi_rundi.crepdl
Lingala (ln)

Latin with open-e ɛ and open-o ɔ. ~45M total L1+L2 speakers in the DRC and Republic of Congo.

schemas/ln-lingala.crepdl
Wolof (wo)

Latin. ~12M native speakers; the most widely spoken indigenous language of Senegal.

schemas/wo-wolof.crepdl
Malagasy (mg)

Latin (Rumi, standard) plus Arabic (Sorabe, historical). ~26M speakers; official language of Madagascar.

schemas/mg-malagasy.crepdl
Nigerian Pidgin / Naijá (pcm)

Latin with ọ and ẹ (dot-below) from the standardised Naijá orthography. ~75M total L1+L2 speakers.

schemas/pcm-nigerian_pidgin_naijá.crepdl
Cameroonian Pidgin / Kamtok (wes)

Latin. ~12M total speakers; the most widely spoken lingua franca of Anglophone Cameroon.

schemas/wes-cameroonian_pidgin_kamtok.crepdl

Special Characters & Symbols

Script-agnostic schemas for punctuation, currency symbols, typographic characters, mathematical operators, and other non-alphabetic code points commonly required in publishing workflows. These schemas can be used standalone or combined with language schemas via CREPDL <union> to extend a language repertoire with a permitted symbol set.

Coming soon: Special character schema files are not yet included in this release. Upload your schemas/ directory files to add them here. The cards below show the planned schemas — filenames will be confirmed once the files are available.
Typographic Punctuation

General Punctuation block (U+2000–206F): typographic spaces, dashes (en, em, figure), quotation marks, daggers, bullets, ellipsis, and editorial marks used across European publishing.

schemas/special-punctuation-typographic.crepdl
Currency Symbols

Currency Symbols block (U+20A0–20CF) plus commonly used symbols from Basic Latin ($ £ ¥) and Latin-1 (¢ ¤), suitable for multilingual financial publishing.

schemas/special-currency-symbols.crepdl
Mathematical Operators

Mathematical Operators (U+2200–22FF) and Supplemental Mathematical Operators (U+2A00–2AFF) for STM publishing and technical documentation.

schemas/special-mathematical-operators.crepdl
Combining Diacritical Marks

Combining Diacritical Marks (U+0300–036F) and Combining Diacritical Marks Supplement (U+1DC0–1DFF) for use with Latin, Greek, or Cyrillic base characters in linguistic and critical-edition contexts.

schemas/special-combining-diacritics.crepdl
Letterlike Symbols

Letterlike Symbols block (U+2100–214F): ℃ ℉ № ™ ℗ © ® ℠ and other frequently used symbols in legal, scientific, and commercial publishing.

schemas/special-letterlike-symbols.crepdl
Arrows

Arrows block (U+2190–21FF) plus Supplemental Arrows-A/B/C for technical documentation, flow diagrams, and instructional publishing.

schemas/special-arrows.crepdl
Geometric Shapes & Box Drawing

Geometric Shapes (U+25A0–25FF), Box Drawing (U+2500–257F), and Block Elements (U+2580–259F) for tabular layouts and diagrammatic use.

schemas/special-geometric-shapes.crepdl
Superscripts & Subscripts

Superscripts and Subscripts block (U+2070–209F) plus Number Forms (U+2150–218F, including Roman numerals and vulgar fractions) for scientific and legal publishing.

schemas/special-superscripts-subscripts.crepdl