-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conformed cleaner to new parser json structure + unit test updates #163
Conversation
"Case Metadata": { | ||
"county": "hays" | ||
}, | ||
"Defendant Information": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 we should use snake_case
on these keys.
if isinstance(data, dict): | ||
# Remove 'judicial officer' if it exists in this dictionary | ||
if "judicial officer" in data: | ||
del data["judicial officer"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 we should add a doc comment for this method. I can't recall why we need to delete the JO information.
"parsing_date": dt.datetime.today().strftime("%Y-%m-%d"), | ||
"html_hash": input_dict["html_hash"], | ||
"Case Metadata": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 just a reminder here to update the casing of these keys.
"Charge Information": [], | ||
"Case Details": { | ||
"earliest_charge_date": "", | ||
"has_evidence_of_representation": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just omit case details and charge information from this initializing record, then append the values later. I wonder if there's a chance to get accidental false negatives here.
elif SECTION == "disposition": | ||
print(f'printing row: {row}') | ||
if len(row) >= 2: | ||
if row[1] in ["Disposition", "Disposition:", "Amended Disposition"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the colon be there in "Disposition:"
?
@@ -8,11 +8,16 @@ | |||
import tempfile | |||
from bs4 import BeautifulSoup | |||
|
|||
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) | |||
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..','..'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious about these lines and ran some questions through Claude.ai -- it suggested we use pythonpath
setting in pyproject.toml
to accomplish the same thing (modifying the way Python resolves module paths).
It seems like there are also a few other ways (like using setup.py
and a config file for our test suite).
I assume this is fine for now, but we should try to figure out the discrepancies between how these modules are getting resolved, because right now it seems to be OS-dependent.
|
||
@patch("os.listdir", return_value=["case1.json", "case2.json"]) | ||
def test_process_single_case(self): | ||
county = "hays" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good that we were able to avoid the @patch
-ing happening for this test 👍
|
||
def get_database_container(self): | ||
#This loads the environment for interacting with CosmosDB #Dan: Should this be moved to the .env file? | ||
load_dotenv() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, just looking at the docs and it seems like you may want to use dotenv_values(".env")
here instead of loading the env, just to make this a little more isolated.
So the following lines could read..
env = dotenv_values(".env") # I assume `env` isn't a reserved word
URL = env["URL"]
URL = env("URL")
KEY = env("KEY")
DATA_BASE_NAME = env("DATA_BASE_NAME")
CONTAINER_NAME_CLEANED = env("CONTAINER_NAME_CLEANED")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also just move load_dotenv()
to the top of this module, I think.
I agree across the board on these updates. Thanks for reading through and giving comments. I'm going to leave a note to come back to this PR and make tickets for these. |
After the new updates to the parser and cleaner separately, I conformed the cleaner to properly access fields in the new structure of json coming from the parser module.
Also, I fixed some unit tests to respond to the new structure differences. I also consolidated test and mock files and fixed mapping issues with some unit tests. Also commented out certain unit tests if they were too long or needed configuration.