-
Notifications
You must be signed in to change notification settings - Fork 0
Form Generation and Data Dictionary Model
Repository providers need purpose-built web forms or wizards to collect deposit streams from various audiences and of various data types. Different drop-boxes will require their own sets of metadata fields and pick lists. Yet each deposit needs to comply with repository standards in terms of metadata schema, data dictionary and encoding practices. Repository managers need to quickly create new web forms for short term projects with special metadata needs. Data flowing from these deposit forms must be ready for ingest.
Using the same Eclipse Modelling Framework as the workbench crosswalks, we can define shared data dictionaries and support rapid composition and deployment of new deposit forms.
A data dictionary is a mapping of user recognizable fields and best practices to a particular metadata encoding. They map and describe entities like "faculty author" as a certain set of elements within something like MODS XML. They include specific instructions for the encoding of elements and general usage guidelines. A model for a data dictionary could define each entity as block of metadata that is mapped to recognizable input fields.
An example block of metadata:
- label: Faculty Author
- usage: Use this block to record the name and affiliations of authors at the university.
- inputs: first name, last name, researcher id, department
- elements:
Each input is really tied to some part of the element encoding portion, in the same way that columns of delimited data are tied to the elements of a crosswalk mapping. Controlled vocabularies also enter in for each input/element combination. Each dictionary entity is sort of like a crosswalk in micro, mapping a semantic unit of metadata. The data dictionary can share the same EMF model as the crosswalks. This gives crosswalk creators the option of plugging delimited data into predefined data dictionary blocks, rather than configuring their own granular MODS elements.
Terminology side note: What is the best word for the elements in a data dictionary. I need one that does not conflict with other terms in this space, which throws out many:
- "element" b/c XML
- "entity" b/c XML (and preposterous)
- "field" b/c web form
The best I have so far is "block" or "metadata block". This gives the sense of building with blocks, which is what people can do when they make crosswalks and forms.
A Deposit Form is perhaps a composition of data dictionary thingies, with some surrounding layout hints and descriptive text. Let assume that the layout support is relatively minimal, say an ordered list of text blocks and metadata blocks. Within each metadata block (referencing the dictionary) we have a set of input fields. If we follow the crosswalk model, then inputs will require specific data types. However, even plain text inputs need specifics for form rendering. These are things like the size of the form field, width and multi-line height.
Technical side note: XForms may supply a ready model for this trick of form composition or it may be a bad fit. There are many ways to rendering a template into a form and the right solution depends mostly on how specific the templates really have to be. For instance, in addition to data type we may have enough rendering specifics by adding a preferred size hint to text inputs and a granularity hint to date inputs. Pick lists are already built into the data dictionary via controlled vocabularies.
Example: Faculty Poster Deposit Form
- divs (an ordered list)
- "Welcome to the deposit form. Here is a link to policies. Please deposit your work and we will provide access to it forever."
- reference to "faculty author" in data dictionary
- reference to a file upload block
- reference to "type of scholarly work" in data dictionary
- reference to "conference entry" in data dictionary
- "Thank You"
There are so many different options you could put into a form definition and I won't even try to add them here. Simple is probably best. If we are rely on the data dictionary for best practices for encoding, then we may want to rely on it for best practices for forms as well. Perhaps blocks in the dictionary can come with a default input form mapping.