Omitted subjects revealed: A quantitative-descriptive approach

Publikace

Abstrakt

In this paper, we present descriptive and computational studies related to omitted subjects. Firstly, we develop a quantitative descriptive study based on three corpora, which consist of journalistic, literary and encyclopedic genres.

Specifically, we quantify the omitted subjects in sentences for each of these corpora; omitted subjects were found in 24%, 41% and 46% of their sentences, respectively. Secondly, applying rule-based strategies, we reconstitute those subjects and place them back to the corpora, with the goal of evaluating how much the omission of subjects can impact the automatic learning of syntactic dependencies.

The results indicate that the formal subject reconstitution can enhance the learning of syntactic dependencies in up to 2% according to the CLAS metric, highlighting the relevant role of linguistic modeling in the automatic learning process. (C) 2021, Universidade Federal de Minas Gerais, Faculdade de Letras. All rights reserved.

Klíčová slova

Computational linguistics Corpus linguistics Linguistic description Machine learning Omitted subject Syntactic dependencies