Statistics on the Real XML Data

Publication at Faculty of Mathematics and Physics |

2006

Abstract

At present the XML is used almost in all spheres of human activities. Its popularity results from the fact that it is a self-descriptive metaformat that allows to define the structure of XML data using powerful tools such as DTD or XML Schema.

Consequently, we can witness a massive boom of techniques for managing, querying, updating, exchanging, or compressing XML data. On the other hand, for majority of them we can find various spots which cause worsening of their efficiency.

Probably the main reason is that most of them consider XML data too globally, involving all their possible features, though the real data are often much simpler. If they do restrict the input data, the restrictions are often unnatural.

In this contribution we discuss the level of complexity of real XML collections and their schemes. We involve and compare results and findings of existing papers on similar topics as well as our own analysis and we try to find the reasons for the tendencies and their consequences.

Keywords

Statistics Data