Our paper introduces the creation and annotation of Araneum Persicum, a new Persian web-crawled corpus. Some problems encountered during the process of filtration and annotation are shown, and an ensemble approach adopted for lemmatization and morphosyntactic annotation is introduced.
It is also argued that Romanization can be helpful in developing corpora for languages not based on Latin script.