Skip to content

Options

kspurgin edited this page Apr 5, 2018 · 26 revisions

An option cannot be repeated in the same config.

An option may be included in multiple configs, even if you are going to use those configs together. In this case you probably want to know how the option settings from the different config levels will interact.

For each option below you’ll find:

  • a description of what the option means/does

  • option type

  • dependencies (other options that must be set if you use this one)

  • option format - how to set the option, eventually with examples of behavior for each

In general, the examples assume that:

  • options not shown are not set

  • input/output fields not shown are not affected

affix type

Option type

simple value

Dependencies
Required by

use id affix - needs to know whether to add id affix value as a prefix or a suffix

Format/examples

There are only two possible valid values:

  affix type: 'prefix'

OR

  affix type: 'suffix'

For examples, see the use id affix option

clean ids

List of regular expression find/replace operations to do on the specified ID values, to clean/modify your IDs.

Option type

list

Dependencies
Requires

main id - needs to know what main id to clean

Tip
If you have specified overlay merged records: true, make sure you have specified merge id (usually 019a or 035z) in addition to main id. Not doing so won’t cause the script to fail, but your overlays might not work as expected.

Format/examples

  • Specify a list of find/replace operations to be carried out on your main id and/or merge id fields/subfields

  • Find/replace values must be specified as regular expressions — I have not tested any advanced lookahead/lookbehind or capture replacements

  • The find/replace operations will be done in order, which sometimes makes a difference. Can get tricky if you are specifying clean ids options in workflow and collection configs that will be combined.


To clean up OCLC numbers in 001

---
institution:
  main id: '001'
workflows:
  WCM:
    clean ids:
      - find: '^o(c[mn]|n)'
        replace: ''
      - find: ' *$'
        replace: ''
      - find: '\\$'
        replace: ''

Input:

=001  ocm55796742\

Output:

=001  55796742

To clean up OCLC numbers in 001 and 035z

---
institution:
  main id: '001'
  merge id: '035z'
workflows:
  WCM:
    clean ids:
      - find: '^o(c[mn]|n)'
        replace: ''
      - find: ' *$'
        replace: ''
      - find: '\\$'
        replace: ''
      - find: '\(OCoLC\)'
        replace: 'OCLC'

Input:

=001  ocm55796742\
=035  \\$a(OCoLC)55796742$z(OCoLC)224254918$z(OCoLC)882239529$z(OCoLC)922980225

Output:

=001  55796742
=035  \\$a(OCoLC)55796742$zOCLC224254918$zOCLC882239529$zOCLC922980225

format flag MARC spec

Specifies exactly what MARC field(s) should be added to indicate the record format.

Option type

list

Dependencies
Required by

write format flag to recs:true

Format/examples

id affix value

Option type

simple value (exception to the config combination rules for this type --- augments rather than replaces parent config values)

Dependencies
Required by

use id affix - needs to know what to add as a prefix or a suffix

Format/examples

  id affix value: 'wcm'

For examples, see the use id affix option

main id

The main/default overlay match point in your system. Tells script which field/subfield to edit if you are adding id affix or cleaning the ids.

Caution
The script currently does all of its internal identification/comparison of records based on 001, based on the assumptions that: (a) Batch record sets without an 001 value are rare; and (b) In most cases, combination of 001/019a replicates the identification of merge records possible based on 035a/035z comparison. This script will not currently work for you if you are (a) Working with incoming or existing record sets that lack an 001 field; AND (b) You want to compare those record sets.
Note
This tool was originally designed to handle MARC batches delivered by OCLC WorldShare Collection Manager, where these assumptions are safe. If I run into cases where they do not hold (as I expect I will), then I’ll have to revisit the role of the main id option.
Option type

simple value

Dependencies
Required by

use id affix - needs to know what main id to add affix to

Required by

clean ids - needs to know what main id to clean

Format/examples

If main id is a MARC control Field (i.e. in the 001-009 range), specify only the MARC tag:

---
institution:
  main id: '001'

If main id is a MARC variable Field (i.e. in the 001-009 range), specify the MARC tag and subfield delimiter:

---
institution:
  main id: '035a'
Note
main id doesn’t have to be in your institution config, but in most cases it makes sense there.

For main id use examples, see any of the options that depend on main id.

overlay merged records

If false, the script only tries to match new to existing records on the main id (internally 001 is used as matchpoint — see main id option documentation for reason behind that)

If true, the script will also try to match new to existing records on merge id (internally, 019a)

Option type

boolean

Dependencies
Requires

use existing record set: true - Script will fail if you ask it to figure out overlays, if you don’t give it an existing file to use

Required by

manipulate 019 for overlay: true - The only reason to manipulate 019s for overlay is if you are overlaying on merged records, so script fails if you aren’t doing that

Required by

flag overlay type: true - Overlay type will always be on Main ID if you are not overlaying on merged records

Format/examples

There are only two possible valid values:

  overlay merged records: true

OR

  overlay merged records: false

For examples of this option, see the other options that require this one.

manipulate 019 for overlay

If true, ensures the 019a in an incoming record, which matches the main id (001) of an existing record, is moved to the beginning of 019 field to achieve overlay

Useful if your system can’t handle matching on subsequent 019s for whatever reason

Option type

boolean

Dependencies
Requires

overlay merged records: true - The only reason to manipulate 019s for overlay is if you are overlaying on merged records, so script fails if you aren’t doing that

Format/examples

There are only two possible valid values:

  manipulate 019 for overlay: true

OR

  manipulate 019 for overlay: false

institution:
  main id: '001'
  merge id: '019a'
workflows:
  WCM:
    clean ids:
      - find: '^o(c[mn]|n)'
        replace: ''
      - find: ' *$'
        replace: ''
      - find: '\\$'
        replace: ''
    use existing record set: true
    overlay merged records: true
    manipulate 019 for overlay: true

Existing record

=001  ocn964614984
=019  \\$a964922865

Incoming record:

=001  ocn972505257
=019  \\$a964922865$a965145436$a964614984$a966396032$a967710583$a971074464$a972608176

Output record:

=001  972505257
=019  \\$a964614984$a964922865$a965145436$a966396032$a967710583$a971074464$a972608176

flag overlay type

If true, writes a new field into any incoming MARC records expected to overlay, specifying whether the record is expected to overlay on main ID or merge ID.

Option type

boolean

Dependencies
Requires

overlay merged records: true - Overlay type will always be on Main ID if you are not overlaying on merged records

Requires

overlay type flag spec - Script must know how you want the MARC field 'flag' written into the record.

Format/examples

There are only two possible valid values:

  flag overlay type: true

OR

  flag overlay type: false

For examples of this option, see the other options that require this one.

overlay type flag spec

Specifies exactly what MARC field(s) should be added to indicate the overlay type of each incoming record expected to overlay.

Option type

list

Dependencies
Required by

flag overlay type: true - Script must know how you want the MARC field 'flag' written into the record.

use existing record set

If you want to compare an incoming record set against an earlier version of the same set, this should be true.

If true, the script will:

  • verify that at least one existing record file exists

  • ingests all records from existing record file(s) and prepares them to be compared with incoming records

    Option type

    boolean

    Dependencies
    Requires

    At least one .mrc file in the existing_marc directory

    Required by

    overlay merged records - Script will fail if you ask it to figure out overlays, if you don’t give it an existing file to use

Format/examples

There are only two possible valid values:

  use existing record set: true

OR

  use existing record set: false

For examples of this option, see the other options that require this one.

use id affix

If true, will add some value as a prefix or suffix to the IDs you’ve specified.

Option type

boolean

Dependencies
Requires

affix type - needs to know whether to add prefix or suffix

Requires

id affix value - needs to know what affix to add

Format/examples


Add prefix to 001s

---
institution:
  main id: '001'
workflows:
  WCM:
    use id affix: true
    affix type: 'prefix'
    id affix value: 'wcm'

Input:

=001  55796742

Output:

=001  wcm55796742

Add suffix to 001 and 019a, using workflow- and collection-level specs

---
institution:
  main id: '001'
  merge id: '019a'
workflows:
  WCM:
    use id affix: true
    affix type: 'suffix'
    id affix value: 'wcm'
collections:
  SpringerLink:
    id affix value: 'SPR'

Input:

=001  55796742
=019  \\$a224254918$a882239529$a922980225

Output:

=001  55796742wcmSPR
=019  \\$a224254918wcmSPR$a882239529wcmSPR$a922980225wcmSPR

write format flag to recs

Warning

This option was quickly added to meet UNC-specific e-resource MARC processing needs. Its behavior is not currently configurable to do anything different than exactly what we need it to do. Eventually I want to improve this and make it more customizable, but it’s relatively low on the priority list.

If true, will add a field (specified in format flag MARC spec) indicating the format of the resource described by each record.

The format is determined by the enhanced-marc ruby gem, which analyzes LDR and other fixed field values to determine the format described by the record, plus some custom logic I wrote in.

For now, the format values output are hard-coded to meet UNC Chapel Hill-specific needs, and assumes online-ness will be checked for via setting warn about non-e-resource records to true.

Option type

boolean

Dependencies
Requires

format flag MARC spec - needs to know what MARC field to write the format into

Requires (not a hard requirement, but you should set it to true)

warn about non-e-resource records - The logic to set format currently assumes that all records being passed through represent e-resources. If you are not checking that this is true via this option, you can expect the format flags written to the records to be incorrect in non-e records.

Format/examples


Write format flag to records

---
institution:
  format flag MARC spec:
    - tag: '990'
      i1: '8'
      i2: '9'
      subfields:
        - delimiter: 'a'
          value: '[FORMAT]'
workflows:
  WCM:
    write format flag to recs: true

Input:

MARC record describing ebook

Output:

=990  89$aBK:ebook