Charles Explorer logo
🇬🇧

Corpona : The Pythonic Way of Processing Corpora

Publication

Abstract

Every NLP researcher has to work with different XML or JSON encoded files. This often involves writing code that serves a very specific purpose.

Corpona is meant to streamline any workflow that involves XML and JSON based corpora, by offering easy and reusable functionalities. The current functionalities relate to easy parsing and access to XML files, easy access to sub-items in a nested JSON structure and visualization of a complex data structure.

Corpona is fully open-source and it is available on GitHub and Zenodo.