This contribution concerns the treatment of a corpus consisting of a weekly financial rubric. In particular, we focused on extracting document-level indexes and extracting textual variables.
Furthermore, we compared some variable extraction methods to evaluate their predictive ability. The results confirm the hypothesis that the vectors derived from word embedding do not improve the predictive capacity compared to other variable extraction methods, but remain a fundamental resource for understanding the semantics in the texts