IoT and Industry 4.0 systems have meant that we’ve become used to easily acquiring lots of information about how the machines are performing, as well as information on production processes in companies. As already explained in this article, it’s always a good idea to plan which data to exchange with the machines.
Modern analysis methods are “data driven” rather than “solution driven“. In the past, you had to start from a hypothesis that would then have to be checked and verified, whilst now, you can start with data and then extract various “indicators”.
However, this has resulted in a widespread misconception that if you take any set of data, and feed it to an Industrial Analytics system, then a set of correlations will magically emerge. Even if this is partly the case, accumulating data without a strategy can be a real waste of time.
Data must be extracted and stored according to certain criteria, as this then makes them usable, especially if it’s going to be used for Machine Learning, Industrial Analytics, or even just for statistics.
Data Scientists know from experience how data should be acquired and organized, as they will have spent significant amounts of time trying to integrate or process it. I usually conduct a careful analysis before configuring data acquisition, but in general I adopt, as a minimum, the “3 Ws” rule, i.e. I ask myself these three fundamental questions:
“What things affect the result that I’m looking to improve?” The information should always be contextualized as much as possible: machine, operator, item, order, raw material, temperature, pressure, force, etc. If there is the slightest suspicion that a certain factor may make a difference to the result of a process, it should be included in the acquisition.
I’ve seen tons of data without context, and let’s just say it required an indescribable amount of effort to connect the data with the relevant context, which is typical of IIoT systems. Let me give you an example: I was asked to process energy consumption data which had already been acquired by the customer. Looking at the graph, you can see that it varied dramatically over time, but that the consumption values only had the date and time of acquisition. What was causing the increases on the graph? What was the item that was increasing consumption? The data certainly didn’t provide any answers to these questions.
“When” includes two questions
“When did the data come about?” Putting a “time stamp” on when information was generated is essential. One of the reasons for this is to do with synchronization, meaning that you can include factors from other sources.
“When should I sample the value?” Let me explain with an example. Let’s imagine you’re obtaining information on the colour, weight or height of various parts passing by on a conveyor belt. If you have this data available in real time and decide to store it at regular intervals, the acquired values will probably be “dirty” because some values will have been sampled when the part is not at the sensor reading point. In this case, you would need to make sure that you were in sync with the handling system, and acquire the value when the part passes through and is measured, rather than at regular intervals.
Before you store large amounts of data, you should really be asking yourself “Why am I acquiring this data?”. If you want to be able to successfully predict an event, maybe using Predictive Analytics techniques, you will not only need to factor in the data that could affect such an event, but you will also need to assign the outcome to it, i.e. the result obtained. Without noting the result obtained, you can’t expect Machine Learning to work successfully, and consequently, prediction or classification will not be so accurate.
In practice, even if you’re storing everything that happens whilst monitoring the process, if you store information without assigning outcomes (whether good or bad), it’s of little use. I’ve seen this mistake made more often than you might think.
So to sum up: Ask yourself what it is that affects the result(s) you want to analyse/improve and store the factors that determine it, i.e. the contextualization of the event that’s being observed. Ask yourself if you should synchronise the data being acquired and always store the “time stamp” of when the data was being acquired. Ask yourself why you need certain information, and record the result along with the context.
Here’s some food for thought on how to acquire data so that it can be usable. Ready for Industrial Analytics?