Daily data output of a medical facility is enormous. It can reach a level of several gigabytes.
A hospital Information System (HIS) creates and collects not only data in a text form, but also from multi-medial sources. Data volume, its diversity, and requirement for online processing and analysis, labels this achievement as a BIG DATA problem.
This article focuses on classification of data stored in HIS, optimization of processing and identification of the best practice for analysis. Text medical records are the prime sources of patient's information, both in a structured and an unstructured form.
Thus a special interest is bestowed on differences during the content analysis processing.