Charles Explorer logo
🇬🇧

Recognizing Preferred Grammatical Gender in Russian Anonymous Online Confessions

Publication

Abstract

We present annotation results for a dataset of public anonymous online confessions in Russian (“Overheard/Podslushano” group in VKontakte, posts tagged #family). Unlike many other cases with online social network data, intentionally anonymous posts do not contain any explicit metadata such as age or gender.

We consider the problem of predicting the author’s preferred grammatical gender for self-reference, a problem that proved to be surprisingly hard and not reducible to simple morphological analysis. We describe an expert labeling of a dataset for this problem, show the findings of predictive analysis, and introduce rule-based and machine learning approaches.