This is a re-implementation of the GRU-D
model with Python3 + Keras2 + Tensorflow
.
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. "Recurrent Neural Networks for Multivariate Time Series with Missing Values", Scientific Reports (SREP), 8(1):6085, 2018.
An earlier version is available on arXiv (arXiv preprint arXiv:1606.01865).
Python3.6
- For dependencies, see
requirements.txt
Python3
packages- h5py==2.10.0
- Keras==2.2.0
- numpy==1.14.0
- scikit-learn==0.19.1
- tensorboard==1.7.0
- tensorflow==1.7.0
- We use
[WD]
to represent the working directory (i.e., working path, in which all related data/log/model/result/evaluation files are stored). - We assume the data are saved in the folder
[WD]/data/${dataset_name}
. Please seedata_handler.py
for more information.- In
[WD]/data/${dataset_name}/data.npz
, there areinput
,masking
,timestamp
,label_${label_name}
. Each of them is of the shape(n_samples, ...)
- In
[WD]/data/${dataset_name}/fold.npz
, there arefold_${label_name}
,mean_${label_name}
,std_${label_name}
. Each of them is of the shape(k_fold, 3, ...)
, for train/validation/test sets in k-fold cross validation.
- In
- Our GRU models take
(x, masking, timestamp)
as the inputs. Please refer tomodels.py
andnn_utils/grud_layers.py
. Run.ipynb
serves as an example script for model training and evaluation.- To demonstrate the data file format, a (fake) sample data is provided at
[WD]/data/sample
. The sample data is generated byGenerate-sample-data.ipynb
.
Running on MIMIC-III
The following steps will help you to conduct experiments for mortality predictions on the MIMIC-III dataset with the time series within the first 48 hours after the patient's admission. We rely on this (older version of) Benchmarking codebase to extract and preprocess the time series data from the MIMIC-III dataset and provide necessary scripts to convert the data for our GRU-D models.
- Make sure you have the (older version of) benchmarking codebase and set up the database connection in the Requirements section.
- Follow steps 1-6 in the Select admissions and all features section. I.e., execute
- All 11 scripts named as
[#]_***.ipynb
for0 <= [#] <= 9
- Some scripts (e.g.,
8_processing.ipynb
) may take hours or a couple of days to complete.
- All 11 scripts named as
- Follow steps 1,2,4 in the Generate 17 processed features, 17 raw features and 140 raw features section. I.e., execute
run_necessary_sqls.ipynb
10_get_17-features-processed.ipynb
10_get_99plus-features-raw.ipynb
- Follow step 3 in the Generate time series section with X=48(hours). I.e., execute
11_get_time_series_sample_99plus-features-raw_48hrs.ipynb
- Now you should have extracted necessary data files from the benchmarking codebase. Please set the directories in
Prepare-MIMIC-III-data.ipynb
and execute it to prepare the data for GRU-D. - Execute
Run.ipynb
and check the results!