Releases: ewancarr/NEWS2-COVID-19
Re-trained models, multiple endpoints, calibration and net benefit
This release provides re-trained models incorporating various improvements:
-
Re-train on entire KCH sample
Our initial models were trained on all data available at the time (n=439). Over time, as more data were collected, the imbalance between the size of the training vs. validation samples increased. In the revised models, therefore, we have re-trained using all available KCH data at the time of writing (n=1276). There is no temporal external validation in this version; external validation will be conducted at other sites.
-
Use admission date as index
We had previously used symptom onset as index date (the date from which endpoints were calculated). To improve the clinical utility of these models, and for consistency across sites, we have switched index date to be:
- Hospital admission for patients with community-acquired COVID infection;
- Symptom onset for nosocomial patients (in-hospital acquired COVID infection).
-
Include 3-day endpoint
In addition to 14-day ICU/death, we additionally consider a 3-day endpoint.
-
Sensitivity analyses excluding nosocomial patients
To investigate whether discrimination and calibration differs for community-acquired vs. nosocomial infection, the models are repeated after excluding nosocomial patients.
-
Better assessment of calibration, net benefit
Please refer to README.md
and replicate.py
for details.
Updated pre-trained models, thresholds model
This release contains two key updates:
1) Updated pre-trained models eb712df
The latest data extract (28th May 2020) for the training sample contained less missing data (e.g. percent missing on albumin
from 27% to 9%; for estimatedgfr
from 24% to 6%). We have therefore updated the pre-trained models in this repository to incorporate the latest data extract.
This shouldn't require any changes to your code, besides a git pull
of the latest models.
2) Threshold model 215bb18
The latest version of replicate.py
includes an additional model based on binary versions of variables from the FINAL
model. These thresholds were derived based on a decision tree model tuned on our training sample. The thresholds are:
NEWS2 > 5.10
CRP >=173.60
Albumin <= 31.10
Estimated GFR <= 31.60
Neutrophils > 8.77
Age (left as continuous)
We have provided a pre-trained model for use with these thresholds (see clf_THRESHOLD.joblib
). The provided code will derive the binary items, impute, and test using the pre-trained model. Please note: You will need to have the un-transformed crp
column in your validation dataset.
Please get in touch or raise an issue if you have any problems.