Skip to content

Releases: ewancarr/NEWS2-COVID-19

Re-trained models, multiple endpoints, calibration and net benefit

14 Jul 10:24
Compare
Choose a tag to compare

This release provides re-trained models incorporating various improvements:

  1. Re-train on entire KCH sample

    Our initial models were trained on all data available at the time (n=439). Over time, as more data were collected, the imbalance between the size of the training vs. validation samples increased. In the revised models, therefore, we have re-trained using all available KCH data at the time of writing (n=1276). There is no temporal external validation in this version; external validation will be conducted at other sites.

  2. Use admission date as index

    We had previously used symptom onset as index date (the date from which endpoints were calculated). To improve the clinical utility of these models, and for consistency across sites, we have switched index date to be:

    • Hospital admission for patients with community-acquired COVID infection;
    • Symptom onset for nosocomial patients (in-hospital acquired COVID infection).
  3. Include 3-day endpoint

    In addition to 14-day ICU/death, we additionally consider a 3-day endpoint.

  4. Sensitivity analyses excluding nosocomial patients

    To investigate whether discrimination and calibration differs for community-acquired vs. nosocomial infection, the models are repeated after excluding nosocomial patients.

  5. Better assessment of calibration, net benefit

Please refer to README.md and replicate.py for details.

Updated pre-trained models, thresholds model

29 May 08:53
Compare
Choose a tag to compare

This release contains two key updates:

1) Updated pre-trained models eb712df

The latest data extract (28th May 2020) for the training sample contained less missing data (e.g. percent missing on albumin from 27% to 9%; for estimatedgfr from 24% to 6%). We have therefore updated the pre-trained models in this repository to incorporate the latest data extract.

This shouldn't require any changes to your code, besides a git pull of the latest models.

2) Threshold model 215bb18

The latest version of replicate.py includes an additional model based on binary versions of variables from the FINAL model. These thresholds were derived based on a decision tree model tuned on our training sample. The thresholds are:

NEWS2 > 5.10
CRP >=173.60
Albumin <= 31.10
Estimated GFR <= 31.60
Neutrophils > 8.77
Age (left as continuous)

We have provided a pre-trained model for use with these thresholds (see clf_THRESHOLD.joblib). The provided code will derive the binary items, impute, and test using the pre-trained model. Please note: You will need to have the un-transformed crp column in your validation dataset.


Please get in touch or raise an issue if you have any problems.