Towards integrated Data Analysis Quality: Criteria for the application of Industrial Data Science

From data quality to data analysis quality

The application of Industrial Data Science in context of connected Smart Products requires modeling and structuring data for its design, development and use. Especially for Smart Products, a comprehensive handling of data quality is mandatory, because of their interdisciplinary character and broad range of heterogeneous stakeholders covering the entire product lifecycle. The overall goal of data preparation is to provide high quality data for application and evaluation by users. Established process models for industrial data analysis often treat the specification and assurance of data quality as a single-point activity with a defined conclusion. Providing end-to-end data quality has received little attention in the field of industrial data analytics. In this paper, we will (1) structure four distinct phases for ensuring end-to-end data quality along data analytics activities, (2) define a set of criteria and measures for meeting and quantifying data quality requirements based on established criteria, and (3) provide a step-by-step model for establishing and maintaining high Data Quality for Industrial Data Science applications. The quality criteria aim to identify pointwise and continuous actions during the data analysis process. Such criteria target a shared responsibility for maintaining data quality during analyses between analyst and user. The developed model provides an actionable approach for assessing and ensuring the requirements of Data Analysis Quality.

This publication is the result of the research work of the Institute of Production Systems at TU Dortmund University and the Institute of Virtual Product Engineering at the Technical University Kaiserslautern.

(1) Four steps for categorizing the criteria of Data Analysis Quality

Industrial Engineering and Industrial Data Science are closely related. Both involve similar steps for fact-based decision-making processes. The first step is to access all necessary data sources so that the provision of data in the following steps is secured. The second step is to analyze the given data to obtain information. The third steps goal is to gain economical benefit out of the obtained information. Therefore, it is necessary to apply the information in the industrial use case. The fourth step deals with administration of the peripheral processes and is added to the first three steps, so that industrial reality is fully mapped in this model.

Figure 1: Process chain of industrial data analysis

(2) Set of criteria to measure Data Quality in the presented process order

The access layer includes aspects regarding the quality of raw data and the corresponding business processes. This results in the following criteria: accessibility, relevancy, timeliness, uniqueness and validity of data. The analysis step relates to the quality of the data analysis and primarily deals with the generation of knowledge. This leads to the following criteria: accuracy, completeness, free-of-error and value-added. The third layer addresses the application of data in an industrial setting and aims to establish a high quality for the results of data analyses. This leads to the following criteria: cost-effectiveness, concise-representation, consistent-representation, interpretability and understandability. The last layer addresses issues of data administration. It includes the following criteria: security, verifiability and confidentiality.

(3) Process model for integrated Data Analysis Quality

The assurance of integrated quality during the steps of industrial data analyses is an end-to-end task. Instead of single-point activities, Data Analysis Quality requires a continuous process that covers the lifecycle of data, information and knowledge. The combination of the four phases of industrial data science projects and the related criteria can be found in this process model.

Figure 2: Process model for integrated data analysis qualit

Conclusion and out view

This paper contributes to the development of integrated Data Analysis Quality for the application of Industrial Data Science. Based on established approaches to categorize Data Quality, it presents an organizational approach using four layers in the IDS process. Overall, the framework, in conjunction with the criteria, enables the realization of a holistic Data Quality strategy such as Total Data Quality Management. However, it is important to initiate suitable measures to ensure a sustainable Data Analysis Quality over the entire product or process lifecycle and if quality aspects are monitored on the long term. A deliberate handling of data quality from start to end will ensure more efficient and successful analysis projects. Regardless of the criteria or dimensions ultimately chosen, it is essential to address the changing nature of the object of consideration, from available data to novel information to value-added knowledge.

Share This Story, Choose Your Platform!