About the written language of the manuscripts

The written language of the manuscripts shared with the HTREC challenge reflects the language of the Byzantine times, which is often considered a deficient form of the Greek language of the Classical times or a premature form of the modern Greek language. Within these texts, morphological categories such as the optative, the pluperfect and the perfect have disappeared, while others such as the dative case have gradually decreased. Infinitives and participles are still there in the texts, as reminiscents of the classical tradition, encouraging one to treat the language as a unique variant, different from the modern Greek language. There are several spelling conventions that deviate from the older orthographic rules while the ancient punctuation signs are still in use, albeit not always with the same function.

To access more texts, a publicly available and potentially-helpful resource is the Medieval Greek Texts (in CSV) corpus.

Vivian Platanou, Holger Essler, and John Pavlopoulos.