Graph Databases: Their Power and Limitations

Publikace na Matematicko-fyzikální fakulta |

2015

Abstrakt

Real world data offers a lot of possibilities to be represented as graphs. As a result we obtain undirected or directed graphs, multigraphs and hypergraphs, labelled or weighted graphs and their variants.

A development of graph modelling brings also new approaches, e.g., considering constraints. Processing graphs in a database way can be done in many different ways.

Some graphs can be represented as JSON or XML structures and processed by their native database tools. More generally, a graph database is specified as any storage system that provides index-free adjacency, i.e. an explicit graph structure.

Graph database technology contains some technological features inherent to traditional databases, e.g. ACID properties and availability.

Use cases of graph databases like Neo4j, OrientDB, InfiniteGraph, FlockDB, AllegroGraph, and others, document that graph databases are becoming a common means for any connected data. In Big Data era, important questions are connected with scalability for large graphs as well as scaling for read/write operations.

For example, scaling graph data by distributing it in a network is much more difficult than scaling simpler data models and is still a work in progress. Still a challenge is pattern matching in graphs providing, in principle, an arbitrarily complex identity function.

Mining complete frequent patterns from graph databases is also challenging since supporting operations are computationally costly. In this paper, we discuss recent advances and limitations in these areas as well as future directions.

Klíčová slova

graph database graph storage graph querying graph scalability Big Graphs