What Kind of Maintenance Are you?

Equipment malfunctions and breakdowns can cause major problems for organizations — from lost money and time to decreased customer satisfaction. Independent of what industry you’re in, it’s important to be prepared with an equipment-maintenance strategy. Depending on how you classify it, there are generally four types of maintenance strategies to consider: reactive maintenance, preventative maintenance, predictive maintenance and proactive maintenance.

Reactive maintenance, also called “breakdown maintenance,” occurs after a component of the equipment has already failed. The damage is done, and other equipment parts may suffer as well. Organizations that rely on reactive maintenance will likely face extended downtime, production overhaul and dissatisfied customers. Reactive maintenance is an unsustainable maintenance strategy and should be avoided.

One way to avoid the issues caused by reactive maintenance is to implement a preventative maintenance strategy. As its name suggests, preventative maintenance is done on a regular schedule in an effort to prevent breakdowns. However, the complexities of maintenance prevention increase with the size of a company. Frequent scheduled functional checks, adjustments, servicing, repairs, replacement, calibration, rebuilding and testing can quickly contribute to high operations and maintenance (O&M) costs. So, while preventative maintenance is certainly a better strategy than reactive maintenance, it still leaves something to be desired.

Predictive maintenance has become a buzz word for a reason — it can be effective at addressing prior maintenance limitations and carries a high ROI. We refer to predictive maintenance as “on-line monitoring” because predictive maintenance is about real-time monitoring, analysis and action. By monitoring and forecasting condition indicators and degradation states of equipment, corrective maintenance can be scheduled ahead of time. Hence, time and money are saved over those tedious routine tasks. In some cases, the benefits can go even further — failures are not only forecasted, but root causes of failures are found and mitigated. This is called proactive maintenance. We’ll share more information on proactive maintenance in a later article, but for now, we’ll focus on the first step in getting there — achieving predictive maintenance.

How to Achieve Predictive Maintenance

Consider this process:

First, all relevant sensors, event logs and O&M data, including a wide range of operating conditions, should be collected. Why? All data regarding breakdowns, planned stops, maintenance tasks, micro-stops, etc., are important in discovering relevant vulnerabilities and will be used in the model training process.

Second, data preprocessing and exploration is required to remove noisy and anomalous data. This improves data quality and general productivity of future algorithms. Additionally, this step acts as a sanity check after multiple data sources are merged; it is always relevant to validate what is seen in the data with what is expected.

Next, different types of failures should be classified, and new features introduced to better understand phenomena and to help differentiate between healthy and faulty machine behavior. e.g. transform time-domain to frequency-domain data.

In terms of model development, depending on the data you have access to and your business specificity, one of three approaches below (or sometimes a mix of approaches) is applied to estimate the remaining useful life (RUL) of a machine.

Survival analysis, also known as failure time analysis or time-to-event analysis, originates from biomedicine. The core element is time-of-failure data from similar machines operated in similar conditions to the one you want to analyze. This is the only model that can be used for estimation of the RUL if the complete history is not accessible or if the data set is limited, e.g. minimal temperature, pressure and vibration traces. If the history is comprehensive and collected for a lot of similar devices from healthy operation, through degradation to failure, similarity analysis will help to assess the RUL.

With similarity analysis, a model calculates real-time similarity between the current state of the analyzed machine and previous machines from the dataset. A subset of the most similar machines, at that moment in time, is chosen to calculate the RUL. If failure data is not available from similar machines, but some safety threshold of a key feature is known (e.g. a vibration level in a rotating equipment that should not be exceeded), a degradation model can be built.

A degradation model can predict the estimated RUL in real time and will adapt the model’s prediction according to changing environments and operation states. It is also one step away from proactive maintenance. Once relations between all key parameters and the RUL are known, nothing stands in the way of root-cause analysis and attempts to mitigate failures or, at least, to extend the RUL.

Once the final model or ensemble of models is built and performance is validated, models can be deployed in production. Once in production, models can be continuously integrated with new real-time data.

To learn more about how Cerebre is helping companies with predictive and proactive maintenance please contact support@cerebre.io

This article was written by Agnieszka Jach, a Senior Data Scientist at Cerebre who focuses on combining power engineering with data engineering