Home

Short title

Machine learning using synthesized patient health records

Author

Gregory Dritschler [email protected]

URLs

Github repo

https://github.ibm.com/IBM/summit-health-machine-learning

Summary

This notebook explores how to train a machine learning model to predict type 2 diabetes using synthesized patient health records. The use of synthesized data allows us to learn about building a model without any concern about the privacy issues surrounding the use of real patient health records.

Technologies

Machine Learning

Description

This project is part of a series of code patterns pertaining to a fictional health care company called Summit Health. This company stores electronic health records in a database on a z/OS server. Before running the notebook, the synthesized health records must be created and loaded into this database. Another project, https://github.com/IBM/summit-health-synthea, provides the steps for doing this. The records are created using a tool called Synthea, transformed and loaded into the database.

When the reader has completed this Code Pattern, they will understand how to:

Prepare data using Apache Spark.
Visualize data relationships using Pixiedust.
Train a machine learning model and publish it in the Watson Machine Learning (WML) repository.
Deploy the model as a web service and use it to make predictions.

Flow

Log in to IBM Watson Studio
Load the provided notebook into Watson Studio
Load data in the notebook
Transform the data with Apache Spark
Create charts with PixieDust
Publish and deploy model with Watson Machine Learning

Instructions

Find the detailed steps for this pattern in the readme file. The steps will show you how to:

Sign up for Watson Studio

Sign up for IBM Watson Studio.
Create a project
Create a Watson Machine Learning instance
Add the notebook to your project
Run the notebook

Components and services

Apache Spark
Watson Machine Learning

Runtimes

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly