Charles Explorer logo
🇬🇧

AgentMat - framework for data scraping and semantization

Publication

Abstract

Most of the enormous amount of information from the internet is available just like web pages made for a human reader. They don?t have any common interface for accessing, searching or browsing the data.

Hence, it?s hard to extract the semantic data from the web, categorize them and keep them updated. For this purpose we have designed and implemented a system called AgentMat.

This system is designed for efficient extraction of large amount of data from the web pages. AgentMat processing is based on an XML-based language describing the given extraction task in a declarative way.

Thanks to this scraping system the raw contents from the irregularly updated and unstructured web pages can be kept categorized and accessed together with the semantic metadata.