Skip to content

Releases: UNC-Libraries/MARC-record-set-wrangler

ruby 3 compatibility

23 Oct 15:42
f1d5293
Compare
Choose a tag to compare
  • Update dependencies for ruby 3 compatibility
  • defer evaluation of #ac_change? unless/until needed
  • UNC WCM workflows/collections do not add AC/noAC fields

Travis uses ruby 2.5

16 Jul 11:46
9ebfe81
Compare
Choose a tag to compare
Merge pull request #23 from UNC-Libraries/fix-travis

Travis uses ruby 2.5

Bump rake to 12.3.3

02 Mar 13:06
e0038a0
Compare
Choose a tag to compare

Handle comparing non-utf8 strings/fields

26 Feb 16:30
69f3f65
Compare
Choose a tag to compare
  • Fix crash when detecting changes to a field whose contents are not valid utf-8. Now when the normalization routine encounters a field that is not valid utf-8, it tries to transcode the contents from marc-8 to utf-8 and continue with normalization. If that also fails, it skips normalization and uses the un-normalized string for comparison.

Add subcollections

19 Aug 17:50
3641b01
Compare
Choose a tag to compare
  • Incoming records can be grouped into subcollections based on whether specified fields match the subcollection's pattern.
  • Subcollections can have id affix's that are added to the institution/workflow/collection affix chain
  • Incoming files are allowed to have duplicate ids when records are in separate subcollections
  • Subcollections can define parameters (e.g. "provider_param: SPIE"). Specs adding fields can reference those parameters (e.g. "value: 'Content provider: provider_param.'). Fields are created for each record using the paramter values of the record's subcollection.

Performance improvements

26 Jul 11:45
28f7101
Compare
Choose a tag to compare
  • Stops writing each existing record to an individual mrc file on disk. Instead caches each record's file and start/stop byte offsets in that file. This allows quick retrieval without all of the disk writing and still without holding everything in memory. This seems to speed up processing by about ~30%.
  • Hashes marc.to_s for each record early on. When comparing incoming/existing records, first compares the hash values. Only when the hash values differ to we need to do more elaborate comparisons (e.g. omitting fields from comparison per specs, normalization etc.) to determine whether the record has changed. This might speed up processing of a set by another ~30%.
  • Allows for conditionally adding MARC fields with parameters, using MarcEdit#add_conditional_field_with_parameters. See lines in config.yaml for example spec. The example adds a 590 to records when a 773 is found containing certain values. Further, the spec maps the 773 value into a parameter to include in the 590.
  • Fixes gh-7. When dupe records exist in the existing set, they are no longer reported as being in the incoming set.

Process WCM holdings, plus

09 May 16:08
9f679d8
Compare
Choose a tag to compare
  • Processes holdings information output by OCLC WorldShare Collection Manager in 996 field. Processes "fulltext" 996s only. Ignores 996s for other formats
  • Significant restructuring of code begun to support eventually refactoring for greater testability
  • Some changes made to UNC/example config -- ignores 6XXs where vocabulary is not specified (i2 = 4) or where i2 = 7 and $2 is not one of the vocabularies we care about and retain locally