Solutions from Ukraine: new electronic database of Crimean Tatar language unveiled for extensive research use
In Ukraine, the national corpus of the Crimean Tatar language was recently launched. It is an electronic archive of texts that offers extensive opportunities for language study and serves as a valuable database.
The Ministry of Reintegration of the Temporarily Occupied Territories reported this.
What is the problem?
According to the 2001 unified population census of Ukraine, it is worth mentioning that there were 231,400 individuals registered as native speakers of the Crimean Tatar language in Ukraine. Among these, 230,200 resided on the Crimean Peninsula.
The new census, conducted in Crimea by the occupation administration at the same time as the census in Russia in October 2021, showed that only 148,600 people on the territory of the peninsula named Crimean Tatar as their native language, of which 133,100 use it in everyday life.
As for the Crimean Tatars' use of their native language in controlled Ukraine, according to the Ministry of Reintegration, only 20-25% of Crimean Tatars speak it.
What is the solution?
A national corpus of the Crimean Tatar language is planned to improve the situation in Ukraine.
"For a qualitative change in this situation, systemic solutions are needed. One of them is the linguistic corpus," the department noted.
According to the Ministry of Reintegration, this is a practical tool for linguists, students, and developers who will create systems and projects using the Crimean Tatar language.
How does it work?
It should be noted that work on the project lasted a year and united about 30 participants in different parts of Ukraine and the world.
As the ministry emphasizes, more than 900 materials were analyzed during this time, including fiction and scientific literature, periodicals, etc.
The corpus will be a comprehensive tool for language research.
In addition, it will become the foundation for the introduction of Crimean Tatar in:
- operating systems,
- online translators,
- spell check programs.
The non-governmental organization QIRI'M Young launched the project with the support of the program "Electronic governance for government accountability and community participation" (EGAP), which is implemented by the charity organization "Eastern Europe Foundation" and financed by Switzerland, the representative office of the President of Ukraine in Crimea, the Ministry of Reintegration and Taras Shevchenko Kyiv National University.
For reference:
It should be noted that the Latin Crimean Tatar alphabet in Ukraine was officially approved by the Cabinet of Ministers [Ukraine's government – ed.] resolution of September 22, 2021. It consists of 31 letters, seven containing diacritical marks and identical to the one approved for use in Crimea by the Verkhovna Rada of Autonomy in 1997.
The relevant ministry also announced plans to launch an award for a special contribution to developing the Crimean Tatar language.
In addition, a Ukrainian startup is teaching AI to recognize the Crimean Tatar language.