Skip to content

v1.4.0

Compare
Choose a tag to compare
@gordonje gordonje released this 23 Aug 18:18
· 338 commits to main since this release
  • Added zipping up and archiving of cleaned CSVs and error logs.
    • Added RawDataVersion.clean_zip_archive FileField.
    • Renamed RawDataVersion.zip_file_archive to RawDataVersion.download_zip_archive.
  • Smaller clean data files (removed unnecessary quote characters).
  • Improvements to tracking models
    • Replaced RawDataCommand model with datetime fields and related properties
      • Added to RawDataVersion instances
        • .update_start_datetime and .update_finish_datetime to store version's most recent update start and finish datetimes.
        • .update_completed returns True if most recent update to version started and finished.
        • .update_stalled returns True if most recent update to version started but did not finish.
        • .download_start_datetime and .download_finish_datetime to store version's most recent download start and finish datetimes.
        • .download_completed returns True if most recent download of version started and finished.
        • .download_stalled returns True if most recent download version started but did not finish.
        • .completed() QuerySet method to RawDataVersion to get all versions where the update completed.
      • Added to RawDataFile instances
        • .clean_start_datetime and .clean_finish_datetime to store raw file's most recent clean start and finish datetimes.
        • .load_start_datetime and .load_finish_datetime to store raw file's most recent load start and finish datetimes.
    • Expanded file size tracking
      • Renamed .size to .expected_size on RawDataVersion instances.
      • Added .download_zip_size to RawDataVersion instances.
      • Added .clean_zip_size to RawDataVersion instances.
      • Added methods to get a pretty version (e.g., 723M) of each file size field
        • Added to RawDataVersion instances
          • .pretty_expected_size()
          • .pretty_download_size()
          • .pretty_clean_size()
        • Added to RawDataFile instances
          • .pretty_download_file_size()
          • .pretty_clean_file_size()
      • Raise CommandError if completed download file size is not the same as expected size.
      • Added RawDataVersion properties to calculate file and record counts:
        • .download_file_count
        • .download_record_count
        • .clean_file_count
        • .clean_record_count
        • .error_file_count
        • .error_count
  • Added extractcalaccessrawfiles management command for unzipping and extracting raw data files from downloaded CAL-ACCESS database export.
    • Start and finish times stored in .start_extract_datetime and .finish_extract_datetime on RawDataVersion instances.
  • Bug fixes.
    • Indownloadcalaccessrawdata, skip download if the size of the local zip file is equal to or bigger than the expected zip file size.
    • Because the server hosting the ZIP doesn’t always provide the most up-to-date resource (as we have documented <https://github.com/california-civic-data-coalition/django-calaccess-raw-data/issues/1487>_), a CommandError will be raised under any of the following conditions:
      • If downloadcalaccessrawdata is not called from the command-line (presumably, then, it was called by updatecalaccessrawdata), and the RawDataVersion instance of the download command doesn't match the most recently started update.
      • If the ETag in the initial HEAD request made by downloadcalaccessrawdata does not match the ETag in the subsequent GET request.
      • If the actual size of the ZIP does not match the value of the Content-Length in the HEAD response.
    • If downloadcalaccessrawdata raises any of the above errors, updatecalaccessrawdata will wait five minutes and try again.
    • When archiving zips and files, open in binary ('rb') mode.
    • In cleancalaccessrawfile, fixed skipping of empty lines for Python 3.5.
  • Support for Django 1.10.