Conceptualization of a data integration and storage framework to enable predictive analytics and application of data analytical algorithms in the manufacturing industry

Stefan Philip Kernbauer

Research output: ThesisMaster's Thesis


This master’s thesis addresses methods and best practises for data integration and storage as well as measures to assess and assure data quality. Furthermore, algorithms for descriptive analytics are described and applied. While the first part is dedicated to the theoretical elaboration of these techniques, the second part looks at the practical application of the mentioned methods. In collaboration with an enterprise from the automotive industry, a critical review of the implemented IT solution is conducted before presenting an IT architecture, which fits the purposes of predictive analytics in the manufacturing industry. Additionally, the data maturity model, as well as descriptive algorithms, are practically applied, and the results are discussed. Building on the methodical approach of the Data Mining Method for Engineers (DMME), which is based on the CRISP-DM, the first theoretical emphasis of this thesis is laid on data integration. By use of the PERFoRM guideline for system architecture, a comprehensive overview of the requirements for system architecture and middleware in modern manufacturing processes are given. This chapter concludes by giving a comprehensive overview of some of the most used middleware solutions. Subsequently, the mechanics of data storage are described. This includes relational (SQL) as well as non-relational (NoSQL) database structures and popular representatives of the respective groups. Besides data integration and storage, data quality assessment and assurance are another focus of this master’s thesis. Therefore, the different data quality dimensions are highlighted with special respect to the aspect of sensor data quality. In addition, methods for automated assessment and improvement of data quality are presented. With the introduction of the data maturity model, a method that facilitates the determination of the ability to answer business problems through data analytics is characterized. To enable analytics of the available data, the theoretical part concludes with a description of methods applicable for the data at hand. The techniques described in the theoretical part are aligned with the practical part of the master’s thesis, thus focussing on exploratory data analytics (e.g., principal component analysis) and descriptive algorithms such as clustering methods. The practical part of the master’s thesis includes a critical review of the current solution for data integration and storage in the manufacturing company. In addition to that, a scalable and performant IT infrastructure, which allows the implementation of predictive analytics at a large scale, is presented. Based on the data available, the data maturity model is applied, giving insights on the ability of the business to implement data analytics. Although the result of the data maturity model yields in most aspects a high degree of maturity, the dimension of data quantity has repercussions on the number of algorithms applicable to the data. Hence, the focus of the applied algorithms is of explorative and descriptive nature. Nevertheless, the outcome of the algorithms is promising and can be used as a proof-of-concept for future applications of predictive analytics.
Translated title of the contributionKonzepterstellung für Datenintegration und -speicherung zur prädiktiven Analyse, sowie praktische Anwendung von Datenanalysealgorithmen in der Fertigungsindustrie
Original languageEnglish
Awarding Institution
  • Montanuniversität
  • Biedermann, Hubert, Supervisor (internal)
  • Maier, Hans, Co-Supervisor (internal)
Publication statusPublished - 2021

Bibliographical note

embargoed until 02-02-2023


  • Data Integration
  • Data Storage
  • Data Analytics
  • Data quality
  • Data conditioning
  • Principal Component Analysis

Cite this