PicaReader provides classes for reading Pica+ records encoded in PicaXML and PicaPlain.
PicaReader is copyright (c) 2012 by Herzog August Bibliothek Wolfenbüttel and released under the terms of the GNU General Public License v3.
You can install PicaReader via Composer.
composer require hab/picareader
All readers adhere to the same interface. You open the reader with a string of input data by calling
Reader::open()
and can call Reader::read()
to read the next record in the input data. If the
input does not contain (anymore) records Reader::read()
returns FALSE
. Otherwise it returns
either a record object created with PicaRecord’s Record::factory()
function.
$reader = new \HAB\Pica\Reader\PicaXmlReader()
$reader->open(file_get_contents('http://unapi.gbv.de?id=opac-de-23:ppn:635012286&format=picaxml'));
$record = $reader->read();
$reader->close();
To filter out records or fields you can attach a filter to the reader via Reader::setFilter()
. A
filter is any valid PHP callback that takes an associative array representing the record as argument
and returns a possibly modified array or FALSE
if the entire record should be skipped.
The array representation of a record is defined as follows:
RECORD := array('fields' => array(FIELD, …)) FIELD := array('tag' => TAG, 'occurrence' => OCCURRENCE, 'subfields' => array(SUBFIELD, …)) SUBFIELD := array('code' => CODE, 'value' => VALUE)
Where TAG
, OCCURRENCE
, CODE
, and VALUE
are the respective properties of a Pica+ field or
subfield.
For example, if your source delivers malformed PicaXML records like so:
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="info:srw/schema/5/picaXML-v1.0">
<datafield tag="">
</datafield>
<datafield tag="001A">
<subfield code="0">0001:14-09-10</subfield>
</datafield>
…
</record>
You can attach a filter function to remove these fields with an invalid tag:
$reader = new PicaXmlReader();
$reader->setFilter(function (array $r) {
return array('fields' => array_filter($r['fields'],
function (array $f) {
return isset($f['tag']) && \HAB\Pica\Record\Field::isValidFieldTag($f['tag']);
}));
});
$record = $reader->read(…);
$reader->close();
Large parts of this package would not have been possible without studying the source of Pica::Record, an open source Perl library for handling Pica+ records by Jakob Voß, and the practical knowledge of our library’s catalogers.