Batch Fault Detection

This project uses a combination of machine learning and IoT messaging to monitor the progress of penicillin drug production batches.

The original simulated batch data are from this repository https://kuleuven.app.box.com/v/batchbenchmark/4/3864480711. The data were originally published as

Jan Van Impe, Geert Gins, An extensive reference dataset for fault detection and identification in batch processes,
Chemometrics and Intelligent Laboratory Systems, Volume 148, 15 November 2015, Pages 20–31.

The overall project is described in this presentation which is covered in this short video of the talk.

Machine Learning

Data Download

I downloaded four sets of data to work with from the original repository:

Aligned_set_1_BASE_NOC - a set of "normal" batch data without faults
Aligned_set_1_BASE_fault_1 - a set of data that has a sudden change in feed substrate concentration
Aligned_set_1_BASE_fault_9 - a set of data that has non-function pH control
Aligned_set_1_BASE_fault_11 - a set of data that has reactor temperature sensor drift

The data were unzipped and saved in directories in the base (/) folder.

Data Exploration and Pre-Processing

I explore the structure of the HDF5 files, the shape and type of data available, and the range of the variables in a notebook: Batch_Data_Explore_and_prep

This output a set of data into two place:

10 random runs from each of the four data sets saved as .csv files in the /data folder.
10 random runs with only 10 points per run saved as .csv files in the /datasample folder.

It also outputs a figure showing sample data from four of the normal runs: /docs/normaldata.png.

Model Training

I used the XGBoost model trained on a sample of the data Batch_data_training. The model was saved in binary format in the root folder: 0002.model. The training progress was copied into a separate file TrainingRecord and then plotted to visulaize the progress.

Model Testing

I test the saved model using an R Markdown notebook: Test_R_Script. The notebook plots the results from two runs (one normal, one fault) and saves them as /docs/testresults.png. The final prediction algorithm is then ported to an R script that can be consumed by the Azure ML Studio: Azure_Script.

IoT Monitoring

IoT Simulated Device

I built a node-red flow to simulate an IoT device: node-red-iot-device. This flow reads in a random file from the datasamples directory, formats the data for sending to an Azure IoT hub, then sends the message.

The message format is documented in the message_format file. The message contains the deviceID, the connection key for the Azure IoT Hub, and the measurement data from the batch (including the two prior points), and a batchID that describes which batch the data belong to.

Azure IoT Hub

I configured an Azure IoT Hub to receive the messages from the simulated device. The hub connection strings and devices are set through both portal.azure.com and the Azure IoT Device Explorer.

Azure Stream Analytics (ASA)

Next, the data from the IoT Hub are consumed by an ASA job. This job reads the data from the messaging endpoint, "batchdata" consumer group from the IoT Hub. The job then pulls a prediction from an Azure ML Studio web app, then sends the data to both an Azure Blob storage endpoint and a PowerBI streaming endpoint.

The ASA job uses a function to get the prediction with the following signature:

batchcheck ( Time FLOAT , Fermentation volume FLOAT , Dissolved oxygen concentration FLOAT , Dissolved CO2 concentration FLOAT , Reactor temperature FLOAT , pH FLOAT , Feed rate FLOAT , Feed temperature FLOAT , Agitator power FLOAT , Cooling/heating medium flow rate FLOAT , Cumulative base flow FLOAT , Cumulative acid flow FLOAT , Fermentation volume prior1 FLOAT , Dissolved oxygen concentration prior1 FLOAT , Dissolved CO2 concentration prior1 FLOAT , Reactor temperature prior1 FLOAT , pH prior1 FLOAT , Feed rate prior1 FLOAT , Feed temperature prior1 FLOAT , Agitator power prior1 FLOAT , Cooling/heating medium flow rate prior1 FLOAT , Cumulative base flow prior1 FLOAT , Cumulative acid flow prior1 FLOAT , Fermentation volume prior2 FLOAT , Dissolved oxygen concentration prior2 FLOAT , Dissolved CO2 concentration prior2 FLOAT , Reactor temperature prior2 FLOAT , pH prior2 FLOAT , Feed rate prior2 FLOAT , Feed temperature prior2 FLOAT , Agitator power prior2 FLOAT , Cooling/heating medium flow rate prior2 FLOAT , Cumulative base flow prior2 FLOAT , Cumulative acid flow prior2 FLOAT ) RETURNS FLOAT

The function is consumed in the ASA query with the script in ASAQuery. Note that the outputs are cast to floats, as some of the messages coming from the input stream are strings (in scientific notation).

Azure Machine Learning Studio

The predictions from the ASA job are acquired by a trained model from Azure ML Studio, deployed as a web app. The ML Studio has a simple format that combines the pre-trained model with the input stream using the R Azure_Script.

The trained model and the XGBoost libraries are contained in the xgboost_w_model_.zip archive. This was loaded to the Azure ML Studio as a dataset and then connected to the R script.

Azure Blob Storage

The data with predictions are stored in Azure Blob storage as .csv files. The files are partitioned by date in a nested folder structure.

PowerBI Stream

Finally, the data and predictions are monitored with a PowerBI dashboard.

Node-Red Stack

An alternative to using the Azure stack is to use a hybrid on-site Python/Node-Red setup. This utilizes a slightly modified version of the IoT node-red flow. Instead of sending a key to the Azure IoT server, it sends a timestamp along with the device name.

There are two different versions of the node-red flow (running on the Raspberry Pi):

node-red-iot-device that is manually triggered to send data to the listening MQTT server on the Windows host machine
node-red-gpio-device that does the same thing, but also has a GPIO pushbutton to trigger the flow and a GPIO LED that blinks every time there is an output message sent.

Server Requirements

Python 3.5
Flask server (conda install flask from an administrator Git Bash)

Source

There are two files that work with the flask server:

Test_Flask_Server that walks through setting up the prediction server
Flask_server that runs the flask server in the background.

In order to run the server, open a Git Bash in the /src directory and run python Flask_server.py. This runs the server on localhost using Port 5000.

Node-Red consumption

The node-red flow now consumes the flask endpoint, getting predictions for each of the data points in the flow. The flow then sends the data to dashboard plots where an end user can monitor the batch process. The server flow is node-red-server-plots.

Establishing Connection between Raspberry Pi and a Local Server

Instruction on how to establish the connection between the Raspberry Pi and a local windows server are detailed in this Readme.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Batch Fault Detection

Machine Learning

Data Download

Data Exploration and Pre-Processing

Model Training

Model Testing

IoT Monitoring

IoT Simulated Device

Azure IoT Hub

Azure Stream Analytics (ASA)

Azure Machine Learning Studio

Azure Blob Storage

PowerBI Stream

Node-Red Stack

Server Requirements

Source

Node-Red consumption

Establishing Connection between Raspberry Pi and a Local Server

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
datasample		datasample
docs		docs
iot		iot
src		src
0002.model		0002.model
LICENSE		LICENSE
README.md		README.md
xgboost_w_model_.zip		xgboost_w_model_.zip

License

madsenmj/ml-iot-fault-prediction

Folders and files

Latest commit

History

Repository files navigation

Batch Fault Detection

Machine Learning

Data Download

Data Exploration and Pre-Processing

Model Training

Model Testing

IoT Monitoring

IoT Simulated Device

Azure IoT Hub

Azure Stream Analytics (ASA)

Azure Machine Learning Studio

Azure Blob Storage

PowerBI Stream

Node-Red Stack

Server Requirements

Source

Node-Red consumption

Establishing Connection between Raspberry Pi and a Local Server

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages