Charles Explorer logo
🇬🇧

Aranea Go Middle East: Persicum

Publication

Abstract

Our paper introduces the creation and annotation of Araneum Persicum, a new Persian web-crawled corpus. Some problems encountered during the process of filtration and annotation are shown, and an ensemble approach adopted for lemmatization and morphosyntactic annotation is introduced.

It is also argued that Romanization can be helpful in developing corpora for languages not based on Latin script.