Understanding Data – From Data Mining to Process Mining

Understanding Data – From Data Mining to Process Mining
February 12, 2019 Shirin
In Know-how, Process Mining, Research

The first algorithms, which today are assigned to Data Mining, were already developed in the eighties. Process Mining algorithms followed only a few years later. Nevertheless, both techniques, especially Process Mining, are still very young disciplines in an entrepreneurial context. However, since they are becoming increasingly important in the face of digital transformation and also complicated with technical progress, it is worth taking a closer look at these topics. Can Process Mining be regarded as differentiated from Data Mining at all? Isn’t the one, as the terms easily suggest, a special form of the other? In this article, we compare Data Mining and Process Mining methodically and examine when Process Mining has developed into an almost independent discipline.

process mining data mining

Algorithms exceed the limits of conventional statistics

Data Mining is defined by the US-American Data Scientist Usama Fayyad as “the application of specific algorithms for the extraction of patterns from data”. Algorithms, therefore, identify patterns in the data, such as trends or relationships between objects and situations. This is why Data Mining is also known under the synonyms “Data Pattern Recognition”, “Database Exploration” or “Knowledge Discovery in Databases” (KDD).

The origins of Data Mining lie in Statistics, whose procedures and methods form the basis of many Data Mining algorithms. The typical procedure in Statistics – to draw up hypotheses and verify them on the basis of data – has fundamentally changed with Data Mining.

data mining und process mining

In the 1980s, researchers began to develop Machine Learning Algorithms that could reverse this approach. A few years later, in the 1990s, the first Process Mining Algorithms were developed. Process Mining refers to methods that generate process knowledge from event logs. Event logs are logged process data from IT-based processes. Process Mining Algorithms visualize and analyze these process data. In contrast to Data Mining, it took almost two decades for Process Mining to become economically viable.

The Goal: Explaining the Inexplicable

Data Mining usually pursues two goals. The aim is either to explain certain circumstances or to make data-based statements about the future. From an entrepreneurial point of view, these are, for example, operating results whose causalities are to be analyzed or forecasts which are to be made on the basis of the operational activity. Algorithms are used that identify relevant patterns in the data with or without the help of samples. The human being’s task is then to interpret these results, to exploit them, and to set up theories that are in line with the meaning. Data Mining has a very broad field of application and uses methods from Artificial Intelligence as well as those from Statistics and Database Research. Data Mining is used in a variety of economic and scientific contexts. From forecasts on company development, analyses of socio-demographic trends to support for medical research.

data mining process mining 3

In Process Mining, only process data from actually executed processes are analyzed. The goal of this analysis varies depending on the process and the company, but the focus is usually on optimizing process performance. Process Mining offers companies the opportunity to gain insights into real process flows and to automatically identify potentials and risks.

Preparation is (almost) everything

Before the algorithms are used, the relevant data must first be provided and transformed. This process – Data Preparation – requires the most effort in Data Mining and Process Mining. In Data Mining, a distinction is made between the selection, preprocessing and transformation of data. During selection, the data is either extracted from databases or collected. During pre-processing, the data is cleaned, for example from documentation errors, completed and integrated. This means that data from different sources are merged. During the transformation, the data is transformed into a suitable and targeted form. Process Mining, on the other hand, extracts data exclusively from IT systems. This is followed by the transformation and loading of the data into a Process Mining Tool, for example. This concludes the ETL process (“Extract”, “Transform”, “Load”).

Data Mining Methods: How Algorithms learn

In Data Mining, there are a number of tasks that an algorithm can perform. One of the most important is…
forecast, i.e. the forecast based on past data.
generalization, the most compact possible description of data by using only the most important values.
pattern recognition, in which problems and relationships between objects are identified.
These tasks can be realized with different techniques. For example by means of clustering – the grouping of similar objects – or classification training. The algorithm trains with training examples – representative sections of the overall data – so that it can then classify unknown objects on the basis of their attribute values.
In addition, there are two different scenarios in which the algorithms operate. Either the data is analyzed with specifications, during so-called supervised learning, or without, during unsupervised learning. In supervised learning, the algorithm uses training examples to learn how to assign unknown objects or situations to a specific class. Practiced methods are for example rule induction or decision trees. In unsupervised learning, no training examples are given and the system has to identify conspicuous contexts or patterns in the data without specifications. Examples are neural networks or population clustering.

Process Mining: Three steps to an optimized process

Since there are so many possible applications in Data Mining, objectives and added value can usually only be defined context-dependently. Process Mining, on the other hand, distinguishes between three methods – Process Discovery, Conformance Checking and Model Enhancement – whose goals and benefits can be specified relatively clearly.
With the Process Discovery Method, the entire process data are visualized in a model. The goal of this method is to obtain transparency about real process implementation.

data mining process mining

In Conformance Checking, the discovered, data-based process is compared with a reference model. Deviations become visible, enabling companies to detect undesirable process deviations or compliance violations, for example. The Model Enhancement is used to analyze process data for optimization potential. Performance indicators such as cycle times or wait times, but also specific process variants or deviations are examined in more detail. The goals include increasing efficiency, deriving potential savings or minimizing compliance risks.

Is a methodical assignment possible?

Can these methods be technically assigned to Data Mining? In parts, yes. Let’s take a look at some examples: In Process Discovery, the data is generalized using the clustering technique, among other things. Identical process activities or variants that form the basis for process visualization are summarized in this way. Pattern recognition is also used for Process Discovery and Conformance Checking. This identifies process weaknesses such as bottlenecks, process loops or unwanted process deviations. LANA Process Mining also uses Supervised Learning: The Automated Root Cause Analysis identifies the causes of detected vulnerabilities. The algorithm is trained to perform cause identification classifications.

Process Mining – its own discipline?

However, one of the most important functions in Process Mining – the graphical, data-based visualization of processes – can no longer be methodically assigned to Data Mining. Data-based process visualization provides companies with insights into even extremely complex business processes. Companies not only generate a profound understanding of processes but also create the basis for Continuous Process Improvement. Identified processes can be used as future reference models, for example, to check the success of derived optimization potentials. Process Mining is a disruptive innovation, especially with regard to classic process documentation and modeling. While conventional process documentation documents processes on the basis of employee surveys, observations or assumptions, Process Mining creates processes based on real process data. This not only maximizes the completeness, objectivity, and accuracy of the process recording but is also a very efficient procedure. Although some Data Mining Techniques are used in Process Mining, Process Mining is an almost independent discipline, especially with regard to visual design.

Would you like to gain an insight into how Process Mining is put into practice?

Let us show you the exciting possibilities of LANA Process Mining in a demo!