The chapter presents the approaches and technologies used to solve the problem of analyzing trends in technology development based on the network semantic structure “Subject-Action-Object”. From the point of view of information about the invention itself, the most important is the description of the invention to the patent.
In electronic databases of patents, all patents begin precisely with the description of the invention to the patent, which in turn has its title page. This form of the description of the invention to the patent is unified, and all patents are presented in this form, that is, all patents are equally structured.
It is this block of the patent—information about the invention must be investigated using the network semantic structure “Subject-Action-Object”. To solve this problem, the structure of the patent was studied; Hadoop technologies, Spark MlLib, clustering methods.
Grid computing technologies have been chosen as a successful and efficient means of processing large text data in the form of patents. An algorithm for parsing a patent document has been developed; an algorithm for preprocessing text documents of a patent selection; a Subject-Action-Object (SAO) extraction algorithm; an algorithm for forming a patent landscape for a certain period.
The concept and architecture of the automated system have been formed, the proposed algorithms have been implemented in software.