Charles Explorer logo
🇬🇧

Feature Engineering in the NLI Shared Task 2013: Charles University Submission Report

Publication at Faculty of Mathematics and Physics |
2013

Abstract

Our goal is to predict the first language (L1) of English essays’s authors with the help of the TOEFL11 corpus where L1, prompts (topics) and proficiency levels are provided. Thus we approach this task as a classification task employing machine learning methods.

Out of key concepts of machine learning, we focus on feature engineering. We design features across all the L1 languages not making use of knowledge of prompt and proficiency level.

During system development, we experimented with various techniques for feature filtering and combination optimized with respect to the notion of mutual information and information gain. We trained four different SVM models and combined them through majority voting achieving accuracy 72.5%.