Parser cleanup #156

aarosan · 2024-09-25T21:26:16Z

parser: break this module into smaller, more readable methods #106

parser: update logic to not require 'test' as a parameter and instead pass everything needed for a test within the unit test #111

parser: add type specification to methods #110

parser: add logging #109

parser: add error handling #108

nicolassaw · 2024-09-29T19:16:58Z

src/parser/__init__.py

+        logger.info("Logger configured")
+        return logger
+
+    def get_class_and_method(self, logger, county: str, test = False) -> Tuple[Optional[object], Optional[callable]]:


Just a small thing and nothing to do on this, but I think we should note that get_class_and_method should return an object (the class) and callable (the method)--ie. not optional--or else I'd consider it an error.

nicolassaw · 2024-09-29T19:21:09Z

src/parser/__init__.py

+        return logger
+
+    def get_class_and_method(self, logger, county: str, test = False) -> Tuple[Optional[object], Optional[callable]]:
+        if test:


Again, nothing to do here, but a small note. I think in the next evolution of this code, instead of passing a test field, we'd instead pass all of the fields via unit test that the method might need. For example, if the test parser only parses a specific mock HTML stored in the test files folder, instead of putting the HTML folder path-designating logic within the code pivoting based on whether test is true, we'd instead pass the method the HTML folder path entirely. This reason being, when we test, we want to test the method logic itself rather than test only the test version. LMK if that doesn't make sense.

nicolassaw · 2024-09-29T19:25:57Z

src/parser/__init__.py

+            logger.info(f"Unexpected error: {e}")
+        return None, None
+
+    def get_directories(self, county: str, logger, parse_single_file: bool = False) -> Tuple[str, str]:


It looks like parse_single_file is meant for testing purposes. We may move away from using "test" as a field or "parse_single_file" as a second field and more towards passing the method the URL. I think overall, knowing which directories to use or not use may be the only piece the pivots based on whether we're using a mock for testing or in production.

nicolassaw · 2024-09-29T19:38:02Z

src/parser/__init__.py

+
+                    parser_instance, parser_function = self.get_class_and_method(county=county, logger=logger, test=test)   
+
+                    if parser_instance is not None and parser_function is not None:


I think this is probably a carry over from the scraper structure, but Matt had a great point on my code there that ultimately I think that parser_instance and parser_function should be not None by the time it gets here and it probably isn't necessary to check this. Ultimately, this will error out anyway if you try to run None with the fields you're passing it.

nicolassaw · 2024-09-29T20:01:02Z

src/parser/hays.py

+            for charge_name, severity in CHARGE_SEVERITY.items():
+                if charge_name in charge:
+                    return severity
+            return float('inf')


This is cool! I had to look this up.

nicolassaw · 2024-09-29T20:01:50Z

src/parser/hays.py

-                bondsman_rows = bondsman_rows[::-1]
-                if bondsman_rows[0][0] != "Bondsman":
-                    bondsman_rows = []
+    def get_case_metadata(self, county: str, case_number: str, case_soup: BeautifulSoup, logger) -> Dict[str, str]:


I like that this was broken out and placed in this bucket. Awesome.

nicolassaw · 2024-09-29T20:03:33Z

src/parser/hays.py

+                "location": "Unknown"
+            }   
+
+    def parse_defendant_rows(self, defendant_rows: List[List[str]], logger) -> Dict[str, str]:


Just a comment. This is the hays-specific parser, and I imagine that this data will be different from one county to the next. I imagine this would be very much county-specific, but as we get into different counties, we'll probably think of creating a helpers functions that works with other functions that don't vary much from county to county. TBD.

nicolassaw · 2024-09-29T20:05:58Z

src/parser/hays.py

+            logger.info(f"Error getting disposition information: {e}")
+            return dispositions
+
+    def parser_hays(self, county: str, case_number: str, logger, case_soup: BeautifulSoup) -> Dict[str, Dict]:


Love the formatting of this method and how it's all come together section by section.

nicolassaw · 2024-09-29T20:09:04Z

src/tester/test_unittest.py

@@ -1,16 +1,20 @@
 import unittest, sys, os, json, warnings, requests, logging
+from unittest.mock import patch, MagicMock, mock_open


I think overall for the unit test additions, I will have to look deeper into using mock library and how it's used here. But otherwise, it's okay to push to main once we can figure out why that one unit test isn't passing.

Great work!

nicolassaw

Great work overall. Thanks so much for your work. I've left a few comments below. The only change I'd request is that we fix that one unit test from not passing (looks like there is some folder not being found?), then push this branch again to the repo, we'll confirm that the unit test is passing, then we can approve and merge this PR. Amazing! Thank you!

- Added type specification to methods - Added logging functionality - Added error handling - Refactored into smaller, more readble methods

nicolassaw reviewed Sep 29, 2024

View reviewed changes

nicolassaw requested changes Sep 29, 2024

View reviewed changes

aarosan added 2 commits October 6, 2024 13:43

Updated Parser Code

ce2f6b9

- Added type specification to methods - Added logging functionality - Added error handling - Refactored into smaller, more readble methods

Parser Unittests

e439b98

newswim force-pushed the parser-cleanup branch from 2b5fff5 to e439b98 Compare October 6, 2024 18:45

newswim added the parser Improving the parsing capabilities label Oct 6, 2024

newswim approved these changes Oct 6, 2024

View reviewed changes

nicolassaw approved these changes Oct 6, 2024

View reviewed changes

newswim merged commit 1257d86 into main Oct 6, 2024
1 of 2 checks passed

newswim deleted the parser-cleanup branch October 6, 2024 18:53

This was referenced Nov 9, 2024

parser: develop unit tests to map onto individual methods in parser module #107

Closed

parser: add error handling #108

Closed

parser: add logging #109

Closed

parser: add type specification to methods #110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser cleanup #156

Parser cleanup #156

aarosan commented Sep 25, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw Sep 29, 2024

nicolassaw left a comment


		parser_instance, parser_function = self.get_class_and_method(county=county, logger=logger, test=test)

		if parser_instance is not None and parser_function is not None:

		@@ -1,16 +1,20 @@
		import unittest, sys, os, json, warnings, requests, logging
		from unittest.mock import patch, MagicMock, mock_open

Parser cleanup #156

Parser cleanup #156

Conversation

aarosan commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolassaw left a comment

Choose a reason for hiding this comment