Introduction. Course Overview.
Evaluation methodology (examples from tagging). Precision, Recall, Accuracy, F-measure. NL Corpora.
The task of Tagging. Tagsets, Morphology, Lemmatization. Morphological Analysis and Generation. Tagging methods. Manually designed Rules and Grammars. Statistical Methods (overview). HMM Tagging (Supervised, Unsupervised). Statistical Transformation Rule-Based Tagging.
Introduction to Parsing. Generative Grammars. Properties of Regular and Context-free Grammars. Non-statistical Parsing Algorithms (An Overview). Simple top-down parser with backtracking. Shift-reduce parser. Treebanks and Treebanking. Evaluation of Parsers.
Probabilistic Parsing. Introduction. PCFG Parameter Estimation. PCFG: Best parse. Probability of a string. Lexicalized PCFG.
Statistical Machine Translation (MT). Alignment and Parameter Estimation for MT.
Continuation of Statistical Methods in Natural Language Processing I.
Introduces the notion of linguistic experiment and its evaluation. The role of corpora in statistical NLP. Standard NLP tasks (tagging, phrase-structure and dependency parsing, generative and discriminative models) are explained and methods presented.