Big Data Storage and Management: Challenges and Opportunities

Publication at Faculty of Mathematics and Physics |

2017

Abstract

The paper is focused on today's very popular theme - Big Data. We describe and discuss its characteristics by eleven V's (Volume, Velocity, Variety, Veracity, etc.) and Big Data quality.

These characteristics represent both data and process challenges. Then we continue with problems of Big Data storage and manage-ment.

Principles of NoSQL databases are explained including their categorization. We also shortly describe Hadoop and MapReduce technologies as well as their inefficiency for some interactive queries and applications within the domain of large-scale graph processing and streaming data.

NoSQL databases and Hadoop M/R are designed to take advantage of cloud computing architectures and allow massive computations to be run inexpensively and efficiently. The term of Big Data 1.0 was introduced for these technologies.

We continue with some new ap-proaches called currently Big Data 2.0 processing systems. Particularly their four categories are introduced and discussed: General purpose Big Data Processing Systems, Big SQL Processing Systems, Big Graph Processing Systems, and Big Stream Processing Systems.

Then, an attention is devoted to Big Analytics - the main application area for Big Data storage and processing. We argue that enter-prises with complex, heterogeneous environments no longer want to adopt a BI access point just for one data source (Hadoop).

More heterogeneous software platforms are needed. Even Hadoop has become a multi-purpose engine for ad hoc analysis.

Finally, we mention some problems with Big Data. We also remind that Big Data creates a new type of digital divide.

Having access and knowledge of Big Data technologies gives companies and people a competitive edge in to-day's data driven world.

Keywords

Big Data NoSQL databases MapReduce Hadoop Big Data 2.0 Big Analytics