Releases: palewire/django-calaccess-raw-data
Releases · palewire/django-calaccess-raw-data
v1.4.7
v1.4.6
v1.4.5
v1.4.1
v1.4.0
- Added zipping up and archiving of cleaned CSVs and error logs.
- Added
RawDataVersion.clean_zip_archive
FileField. - Renamed
RawDataVersion.zip_file_archive
toRawDataVersion.download_zip_archive
.
- Added
- Smaller clean data files (removed unnecessary quote characters).
- Improvements to tracking models
- Replaced
RawDataCommand
model with datetime fields and related properties- Added to
RawDataVersion
instances.update_start_datetime
and.update_finish_datetime
to store version's most recent update start and finish datetimes..update_completed
returnsTrue
if most recent update to version started and finished..update_stalled
returnsTrue
if most recent update to version started but did not finish..download_start_datetime
and.download_finish_datetime
to store version's most recent download start and finish datetimes..download_completed
returnsTrue
if most recent download of version started and finished..download_stalled
returnsTrue
if most recent download version started but did not finish..completed()
QuerySet method toRawDataVersion
to get all versions where the update completed.
- Added to
RawDataFile
instances.clean_start_datetime
and.clean_finish_datetime
to store raw file's most recent clean start and finish datetimes..load_start_datetime
and.load_finish_datetime
to store raw file's most recent load start and finish datetimes.
- Added to
- Expanded file size tracking
- Renamed
.size
to.expected_size
onRawDataVersion
instances. - Added
.download_zip_size
toRawDataVersion
instances. - Added
.clean_zip_size
toRawDataVersion
instances. - Added methods to get a pretty version (e.g.,
723M
) of each file size field- Added to
RawDataVersion
instances.pretty_expected_size()
.pretty_download_size()
.pretty_clean_size()
- Added to
RawDataFile
instances.pretty_download_file_size()
.pretty_clean_file_size()
- Added to
- Raise
CommandError
if completed download file size is not the same as expected size. - Added
RawDataVersion
properties to calculate file and record counts:.download_file_count
.download_record_count
.clean_file_count
.clean_record_count
.error_file_count
.error_count
- Renamed
- Replaced
- Added
extractcalaccessrawfiles
management command for unzipping and extracting raw data files from downloaded CAL-ACCESS database export.- Start and finish times stored in
.start_extract_datetime
and.finish_extract_datetime
onRawDataVersion
instances.
- Start and finish times stored in
- Bug fixes.
- In
downloadcalaccessrawdata
, skip download if the size of the local zip file is equal to or bigger than the expected zip file size. - Because the server hosting the ZIP doesn’t always provide the most up-to-date resource (as we have
documented <https://github.com/california-civic-data-coalition/django-calaccess-raw-data/issues/1487>
_), aCommandError
will be raised under any of the following conditions:- If
downloadcalaccessrawdata
is not called from the command-line (presumably, then, it was called byupdatecalaccessrawdata
), and theRawDataVersion
instance of the download command doesn't match the most recently started update. - If the
ETag
in the initial HEAD request made bydownloadcalaccessrawdata
does not match theETag
in the subsequent GET request. - If the actual size of the ZIP does not match the value of the
Content-Length
in the HEAD response.
- If
- If
downloadcalaccessrawdata
raises any of the above errors,updatecalaccessrawdata
will wait five minutes and try again. - When archiving zips and files, open in binary (
'rb'
) mode. - In
cleancalaccessrawfile
, fixed skipping of empty lines for Python 3.5.
- In
- Support for Django 1.10.
v1.3.0
v1.2.0
- Enhancements to tracking models
- Zero pad datetime parts of the archive directory for better sorting
- Calculate and store
load_columns_count
andload_records_count
in theRawDataFile
model. - Added
error_count
anderror_log_archive
fields toRawDataFile
in order to track bad line parses during thecleancalaccessrawfile
command. - Added
download_file_size
andclean_file_size
fields to theRawDataFile
model.
- Enhancements to CAL-ACCESS models
- Added "inactive" models group for CAL-ACCESS tables that are empty or apparently no longer in use.
- Added a
CalAccessMetaClass
to automatically configure meta attributes common to all models. - Added a custom admin for every model.
- Model verbose names are pre-fixed with model groups
- Edits to model doc strings.
- Enhancements to management commands
- Added standard logging to the
header
,log
andsuccess
methods. - Added a
logger.info
to the end of theupdatecalaccessrawdata
command to allow sending of emails when finished - Edits to command doc strings.
- Added standard logging to the
- More tests
- Test to confirm that any field included in a model's
UNIQUE_KEY
attribute actually exists on the model. - Test to confirm that every model has a custom admin.
- Added
flake8_docstrings
plugin to the testing routine - New unittest modules providing 100% coverage to most of the app's components
- Test to confirm that any field included in a model's
- Bug fixes
- Fixed numbers in
clean_records_count
for theRawDataFile
model. - Fixed line numbers logged in errors.csv files.
reportcalaccessrawdata
now writes output to the data directory instead ofREPO_DIR
.
- Fixed numbers in
- Distribution now packaged in
wheel
format
v1.1.0
- When
--noinput
is invoked forupdatecalaccessrawdata
, exit if previously updated to the currently available version. - Enforce lowercase UNIQUE_KEY settings on models.
- Removed unnecessary
pretty_amount
model methods as part of driving common.py models file test coverage up to 100%.
v1.0.2
v1.0.0
- Enhanced resume behavior
- Allow previously interrupted updates to resume at any stage of the process: downloading, cleaning or loading.
- Users will be prompted to resume (if possible). User may decline and re-start the entire update.
- Removed
--resume-download
option fromupdatecalaccessrawdata
anddownloadcalaccessrawdata
in favor of prompting the user to resume. - Removed
--database
option from all commands. Multi-database users are encouraged to use Django's database routers.
- Raw data file archiving
- Added
CALACCESS_STORE_ARCHIVE
setting. When enabled, management commands will save each version of the downloaded .zip file, the extracted .tsv files and cleaned .csv files to the Django project'sMEDIA_ROOT
. - Added FileFields to RawDataVersion and RawDataFile in order to link the database records with the archived files they reference.
- Added
- Completed documentation of all 80 raw data models and 1,467 fields
- Defined hundreds of choices for 182 look-up fields.
- Published expanded Django project documentation. Added re-directs from old app-specific documentation.
- Integrated references to official documents and filing forms into data models. PDFs on DocumentCloud.
- Expanded unit testing of data model documentation
- Wider scope of choice field testing.
- Verify that each model has a
UNIQUE_KEY
attribute set. - Verify that each model has a document reference.
- Verify that each choice field has a document reference.
- Verify that each model with a form_type or form_id field (with a few exceptions) is linked to filing forms.
- Introduced
reportcalaccessrawdata
command, which generates a report outlining the number / proportion of files / records cleaned and loaded.
- Model Re-modeling:
- Moved
BallotMeasuresCd
fromother.py
tocampaign.py
. Same with admin. - Moved remaining models in
other.py
tocommon.py
. Removedother.py
. Same with admins. - Re-ordered models into related groups.
- Moved
- Bug fixes
- Truncate time portions of raw datetime values #1457.
- Strip newlines when loading into MySQL.