Workflows for kickstarting RBMT in virtually No-Resource Situation

Publikace

Abstrakt

In this article we describe a work-inprogress best learnt practices on how to start working on rule-based machine translation when working with language that has virtually no pre-existing digital resources for NLP use. We use Karelian language as a case study, in the beginning of our project there were no publically available corpora, parallel or monolingual analysed, no analysers and no translation tools or language models.

We show workflows that we have find useful to curate and develop necessary NLP resources for the language. Our workflow is aimed also for no-resources working in a sense of no funding and scarce access to native informants, we show that building core NLP resources in parallel can alleviate the problems therein.

Klíčová slova

rule-based machine translation low-resource languages Karelian