Charles Explorer logo
🇬🇧

From Genesis to Creole Language: Transfer Learning for Singlish Universal Dependencies Parsing and POS Tagging

Publication

Abstract

Singlish can be interesting to the computational linguistics community both linguistically, as a major low-resource creole based on English, and computationally, for information extraction and sentiment analysis of regional social media. In our conference paper, Wang et al. (2017), we investigated part-of-speech (POS) tagging and dependency parsing for Singlish by constructing a treebank under the Universal Dependencies scheme and successfully used neural stacking models to integrate English syntactic knowledge for boosting Singlish POS tagging and dependency parsing, achieving the state-of-the-art accuracies of 89.50% and 84.47% for Singlish POS tagging and dependency, respectively.

In this work, we substantially extend Wang et al. (2017) by enlarging the Singlish treebank to more than triple the size and with much more diversity in topics, as well as further exploring neural multi-task models for integrating English syntactic knowledge. Results show that the enlarged treebank has achieved significant relative error reduction of 45.8% and 15.5% on the base model, 27% and 10% on the neural multi-task model, and 21% and 15% on the neural stacking model for POS tagging and dependency parsing, respectively.

Moreover, the state-of-the-art Singlish POS tagging and dependency parsing accuracies have been improved to 91.16% and 85.57%, respectively. We make our treebanks and models available for further research.