Exploitation of linguistic tools in semantic extraction - a design

Publication at Faculty of Mathematics and Physics |

2008

Abstract

The paper addresses a problem of information extraction from Czech texts from the Web. The method described in the paper exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0).

We propose a system which captures text of web-pages, annotates it linguistically by PDT tools, extracts data and stores the data in an ontology. We report on initial experiments in the domain of reports of traffic accidents.

These experiments are promising, e.g. enabling summarization of the number of injured people.

Keywords

Exploitation linguistic tools semantic extraction design