Interoperability can be looked at through a very narrow lens of a transaction between a producer and a consumer of data, with intermediate processing steps acting as both consumers and producers.
Interoperability reduces the overall cost of using software. The end user needs to assimilate less information about a data source to consume it, as some aspects of it are "well known" and can be dealt with by reusable software.
This is typically extended the "publish-find-bind" model which introduces a third _catalog component whose function is to assist clients identify services they can interoperate with.
So a community of practice can publish some information which increases interoperability, to realise some benefit ""at scale".
The problem is what information is needs and in what form?
What a user needs to know about a data source is quite complex. This may be described in a "data product specification" of some sort, such documents are very involved. If this is observational data it will include a lot of specific information about the science, process, interpretation etc. Such information can be arbitrarily detailed, and only standardised in a few cases where the interoperability imperative means sufficient resources can be applied.
So typically, interoperability is a partial solution, and typically is based on allowing software components to be designed to deal with data access and transfer, and the end-user to work out how to assimilate that data. The level of detail determines how much effort is left to the user relatively to the software developer.
The size, informational complexity and sophistication of the community of practice determines where the cost-benefit ratio balances between investment in standards and software, and complexity for the end user.
If the community is one with a single data provider and many users, then a single custom API is sufficient - developers just need some interoperability in the API definition, and can do all the data interpretation needed to present simple information to a user.
In more complex domains however, there will be many data providers, and many clients, including data archiving, data processing and decision support systems. A simple API cannot be described in sufficient detail to handle all cases without become overwhelmingly complex.
At this point it is necessary to break down the different aspects of interoperability, and who the stakeholders are:
This is a crude breakdown of typical things that are often standardised to improve interoperability
Aspect | Stakeholders | Function | Example | |
---|---|---|---|---|
API | client software developers | Allow clients to access specific data or functions | OGC API, NGSI | |
API model | Developers of APIs and tools to create clients | Standardise way APIs are described. | OpenAPI | |
Format | Schema definers | Define structure according to a meta-model. | XML, CSV, JSON, RDF encodings | |
Schema | Meta-model aware clients | Map elements of a data model to an encoding meta-model. Often confused or conflated with data model. | FOAF, NGSI-data models | |
Data model | Data publishers and consumers | Define objects, thier properties and relationships. | DCAT, OWL, SKOS, PROF | |
Vocabularies | Data modellers, schema developers | Define reusable elements for data model interperability. | Dublin core, Schema.org | |
Taxonomies | Data specifiers, data processors | Provide definitions and relationships between coded content values. | OpenAPI,OGC API, NGSI | |
Profiles | Data specifiers, data processors | Define how more general specifications can be used in specific contexts. | GeoDCAT | |
Data products | Data specifiers, data processors | Define arbitrary details of data, including quality, structure, content rules etc. | statistical reporting, census, national mapping products | |
Data access | Data producers and consumers | Specify data access terms and mechanisms, such as licences and service availability. | CC-By and other licences. |
This is not exhaustive, nor are the stakeholders necessarily different individuals, however it is aimed at identifying that interoperability is a multi-faceted problem, and that for any given circumstance different stakeholders will have different motivations, capabilities and cost-benefit equations for standardising different aspects of interoperability.
With a more detailed breakdown it is possible to think about total costs and benefits at different scales, over different time periods.
We have an "interoperability backlog" - defined as the gap between the optimum level of interoperability for a domain in the long term and the current level of interoperability achieved - and evidenced by the many activities looking to improve interoperability.
This shortfall exists because short-term project design choices start to make sense because they do not factor in costs or benefits beyond project participants or project timescales. This applies to software development too - products have markets and post-sale concerns are weighted lower than immediate value at sale time.
This is the "local optimum" problem - the best solution at one scale does not necessarily work best at other scales.
If a particular transaction can be improved by use of a common schema, then the benefit that accrues is based on how many transactions are involved, how many users, how often an user performs that transaction, what is the initial cost of evaluating the use of data in a given way, and how much each subsequent usage costs.
For example, if data contains a set of values, and those values are standardised, a user may only need to establish the meaning of these once, decide how to treat each value, and then repeat this at low marginal cost for repeated transactions. If the these values are not standardised, then a great deal of effort may be required per transaction to interpret them.
Costs to develop and promulgate a standard are amortised over the total value achieved by making transactions more efficient. If transactions are few, or the benefit marginal, interoperability is less important than if many users, many data sources or many transactions are involved.
The more flexible a system needs to be, the more things like mechanisms for incrementally improving interoperability, such as publishing data schemas, matter rather than specific instances of these artefacts.
A "one-size-fits all" reference architecture for interoperability will not easily yield insight into where the interoperability gaps, drivers and opportunities arise. Conversely, using this more detailed breakdown of concerns it is possible to look at any situation and identify where specific problems and opportunities lie, and what technical approach and skills may be most relevant.