Skip to content

Commit

Permalink
Fix RAPID temperature units (#201)
Browse files Browse the repository at this point in the history
* Upload one-off script
* Update RAPID parser & output file
  • Loading branch information
pipliggins committed Nov 15, 2023
1 parent 3061496 commit 44e9262
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 7 deletions.
6 changes: 3 additions & 3 deletions isaric/parsers/isaric-rapid.toml
Original file line number Diff line number Diff line change
Expand Up @@ -877,7 +877,7 @@
{ ccm_a_fio2_lborres = { ">" = 0.21 } },
{ ccm_a_fio2_lborres = { "<=" = 1 } },
], apply = { function = "isNotNull" } },
{ field = "ccm_a_fio2b_lborres", if = { daily_fio2b_lborres = { ">" = 21 } }, apply = { function = "isNotNull" } },
{ field = "ccm_a_fio2b_lborres", if = { ccm_a_fio2b_lborres = { ">" = 21 } }, apply = { function = "isNotNull" } },
{ field = "ccm_a_fio2c_lborres", apply = { function = "isNotNull" } },
]

Expand Down Expand Up @@ -1812,13 +1812,13 @@
name = "temperature_celsius"
phase = "admission"
date = { ref = "admissionDateHierarchy" }
value = { field = "temp_vsorres" } # there is no source unit field, but a mix of celsius and farenheit in the data.
value = { field = "temp_vsorres_new" }

[[observation]]
name = "temperature_celsius"
phase = "study"
date = { field = "daily_date" }
value = { field = "daily_temp_vsorres" } # there is no source unit field
value = { field = "daily_temp_vsorres_new" }
context = ['Most abnormal reading']

[[observation]]
Expand Down
20 changes: 20 additions & 0 deletions isaric/parsers/isaric-rapid/fix-temperature-units.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Corrects temperatures recorded in farenheit to celsius based on max human internal temperature.

import pandas as pd

def convert_temperature_units(value):
if value <= 50:

This comment has been minimized.

Copy link
@abhidg

abhidg Nov 16, 2023

Contributor

Is this limit documented in the source? If units are ambiguous, then I would advise dropping columns entirely, thoughts @sadiekelly? Like here, 50 F ~ 10 Celsius which is very low!

This comment has been minimized.

Copy link
@pipliggins

pipliggins Nov 16, 2023

Author Collaborator

The column is full of mixed temperatures - they should all be in Celsius according to the CRF but there are a load that are very clearly Fahrenheit, between 95 and 105. There's actually a very clear split - highest assumed C temp is 41.1, then the next highest temperature is 95.8 looking at the temp_vsorres column, very similar pattern in daily_temp_vsorres.

I could change this to only convert if the temperature is above 90? But we'd actually end up with identical data.

This comment has been minimized.

Copy link
@sadiekelly

sadiekelly Nov 16, 2023

Collaborator

@pipliggins sounds ok given the clear split in the data! maybe it's best to update the conversion so it's clear that if a value of 51 did exist it wouldn't be converted though

This comment has been minimized.

Copy link
@pipliggins

pipliggins Nov 17, 2023

Author Collaborator

@sadiekelly do you mean raise the temperature limit? At the moment anything above 50 is converted.

This comment has been minimized.

Copy link
@sadiekelly

sadiekelly Nov 17, 2023

Collaborator

hi @pipliggins, yes just for audit trail purposes really, as changing the conversion will not affect the data but if anyone were to look back at the parser for the conversion then it might be confusing why 50 was selected whereas 90 makes more sense considering Farenheit ranges. Then if the same conversion were to be used on another parser it wouldn't incorrectly convert 51 F to C

This comment has been minimized.

Copy link
@pipliggins

pipliggins Nov 17, 2023

Author Collaborator

Got it, all done!

return value
elif value > 50:
return (value - 32) * 5/9

# import data
df = pd.read_csv("ISARIC RAPID/ISARICCOVID19RAPIDFo_DATA_2022-07-06_0932.csv")

# create new columns with the converted data
df['temp_vsorres_new'] = df.apply(lambda x: convert_temperature_units(x.temp_vsorres), axis=1)
df['daily_temp_vsorres_new'] = df.apply(lambda x: convert_temperature_units(x.daily_temp_vsorres), axis=1)

# save the new file
df2 = df.convert_dtypes()
df2.to_csv("ISARIC RAPID/ISARICCOVID19RAPIDFo_DATA_2022-07-06_0932_temperaturefix.csv", index=False)
8 changes: 4 additions & 4 deletions output/ISARIC RAPID/adtl-output.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
>adtl isaric-rapid.toml ISARICCOVID19RAPIDFo_DATA_2022-07-06_0932.csv --include-defs isaric-rapid.json
>adtl isaric-rapid.toml ISARICCOVID19RAPIDFo_DATA_2022-07-06_0932_temperaturefix.csv --include-defs isaric-rapid.json
|table |valid |total |percentage_valid|
|---------------|-------|-------|----------------|
|subject |5546 |8061 |68.800397% |
|visit |4911 |8061 |60.922962% |
|observation |419391 |436586 |96.061486% |
|observation |416311 |430313 |96.746089% |

## subject

Expand All @@ -19,5 +19,5 @@

## observation

* 14373: data must contain ['phase', 'date', 'name'] properties
* 2822: data must be valid exactly by one definition (0 matches found)
* 13853: data must contain ['phase', 'date', 'name'] properties
* 149: data must be valid exactly by one definition (0 matches found)

0 comments on commit 44e9262

Please sign in to comment.