-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser cleanup #156
Parser cleanup #156
Conversation
src/parser/__init__.py
Outdated
logger.info("Logger configured") | ||
return logger | ||
|
||
def get_class_and_method(self, logger, county: str, test = False) -> Tuple[Optional[object], Optional[callable]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small thing and nothing to do on this, but I think we should note that get_class_and_method should return an object (the class) and callable (the method)--ie. not optional--or else I'd consider it an error.
return logger | ||
|
||
def get_class_and_method(self, logger, county: str, test = False) -> Tuple[Optional[object], Optional[callable]]: | ||
if test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, nothing to do here, but a small note. I think in the next evolution of this code, instead of passing a test field, we'd instead pass all of the fields via unit test that the method might need. For example, if the test parser only parses a specific mock HTML stored in the test files folder, instead of putting the HTML folder path-designating logic within the code pivoting based on whether test is true, we'd instead pass the method the HTML folder path entirely. This reason being, when we test, we want to test the method logic itself rather than test only the test version. LMK if that doesn't make sense.
src/parser/__init__.py
Outdated
logger.info(f"Unexpected error: {e}") | ||
return None, None | ||
|
||
def get_directories(self, county: str, logger, parse_single_file: bool = False) -> Tuple[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like parse_single_file is meant for testing purposes. We may move away from using "test" as a field or "parse_single_file" as a second field and more towards passing the method the URL. I think overall, knowing which directories to use or not use may be the only piece the pivots based on whether we're using a mock for testing or in production.
|
||
parser_instance, parser_function = self.get_class_and_method(county=county, logger=logger, test=test) | ||
|
||
if parser_instance is not None and parser_function is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably a carry over from the scraper structure, but Matt had a great point on my code there that ultimately I think that parser_instance and parser_function should be not None by the time it gets here and it probably isn't necessary to check this. Ultimately, this will error out anyway if you try to run None with the fields you're passing it.
for charge_name, severity in CHARGE_SEVERITY.items(): | ||
if charge_name in charge: | ||
return severity | ||
return float('inf') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool! I had to look this up.
bondsman_rows = bondsman_rows[::-1] | ||
if bondsman_rows[0][0] != "Bondsman": | ||
bondsman_rows = [] | ||
def get_case_metadata(self, county: str, case_number: str, case_soup: BeautifulSoup, logger) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that this was broken out and placed in this bucket. Awesome.
"location": "Unknown" | ||
} | ||
|
||
def parse_defendant_rows(self, defendant_rows: List[List[str]], logger) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment. This is the hays-specific parser, and I imagine that this data will be different from one county to the next. I imagine this would be very much county-specific, but as we get into different counties, we'll probably think of creating a helpers functions that works with other functions that don't vary much from county to county. TBD.
logger.info(f"Error getting disposition information: {e}") | ||
return dispositions | ||
|
||
def parser_hays(self, county: str, case_number: str, logger, case_soup: BeautifulSoup) -> Dict[str, Dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the formatting of this method and how it's all come together section by section.
@@ -1,16 +1,20 @@ | |||
import unittest, sys, os, json, warnings, requests, logging | |||
from unittest.mock import patch, MagicMock, mock_open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think overall for the unit test additions, I will have to look deeper into using mock library and how it's used here. But otherwise, it's okay to push to main once we can figure out why that one unit test isn't passing.
Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work overall. Thanks so much for your work. I've left a few comments below. The only change I'd request is that we fix that one unit test from not passing (looks like there is some folder not being found?), then push this branch again to the repo, we'll confirm that the unit test is passing, then we can approve and merge this PR. Amazing! Thank you!
- Added type specification to methods - Added logging functionality - Added error handling - Refactored into smaller, more readble methods
2b5fff5
to
e439b98
Compare
parser: break this module into smaller, more readable methods #106
parser: update logic to not require 'test' as a parameter and instead pass everything needed for a test within the unit test #111
parser: add type specification to methods #110
parser: add logging #109
parser: add error handling #108