Skip to content
GCHQDeveloper56 edited this page Jun 10, 2021 · 3 revisions

The High-Quality Data Modelling Framework

Wouldn’t it be good if humans and machines could interpret data consistently, with no ambiguity?

It doesn't take long to realise that this isn't what we typically experience with data and data-based systems today. Even simple things that we take for granted when expressed digitally (names, dates, phone numbers and so on) aren’t represented or interpreted consistently by humans and machines. Most digital information that we interact with daily is only meaningful to humans (websites, corporate documents, CAD drawings, spreadsheets, etc). It can take a huge amount of (largely human) effort to keep track of some apparently simple records about things that we care about and the more involved these records become the harder it is to find, associate and use them. This is despite a significant amount of computational power at our collective disposal. What has led to this apparent mis-match?

The massive growth in the availability of data has provided a rich territory for analytic techniques that try to compensate for ambiguity in the data. Processing techniques have evolved that are dimensioned around this characteristic in nearly all data. Internet search engines are a good example of such systems. However, with a lack of sufficient structure and inherent uncertainty around much of the data that is available today there comes a heavy price. It limits the utility of the data that we create and presents a significant management challenge; how do you manage something if you don't really know what is represented by it? Many information systems would benefit from reduced inconsistency in the data that they store and process. Some even strive to meet requirements that can only be solved through addressing this consistency challenge. The theory on which Magma Core is constructed provides an approach to addressing this.

At a more technical level there is an information-technical challenge. We have all experienced the difficulty of maintaining or using systems that set out to manage data to a high degree of confidence (like corporate systems that deal with our personal details, pay and expenses, etc.). For those who have worked on these systems it is a common occurrence that once such a system is adopted it becomes both complex and costly to accommodate new requirements. A familiar story is that the original designs (including data models) become increasingly hard to comprehend and modify. One approach to work around (or postpone) these issues is to avoid adding structure as much as possible; tempting in the short-term but fundamentally limiting (and unacceptable when there are formal expectations around these systems). Another way is to deal with them in a way that provides a structural framework from the outset, accommodating the representational* patterns needed to enable the possibility of unambiguous representation of anything that is of sufficient importance to warrant a formal record.

Implementing an information system that adopts the latter approach may sound like a worthwhile endeavour. This code release is the result of a prototype system that aimed to do this and, while succeeding in the goal to be driven by consistent data that was dynamically extensible, learnt the hard way that there are few tools to enable such systems to be created, managed and built upon.

One of the challenges with tackling the subject of corporate 'knowledge’ records is that “Knowledge Stores” have been fashionable for some time without giving much, if any, thought to what the notion of what “Knowledge” actually is. As we are talking about storing data in databases it can be difficult for customers, users, developers and researchers to recognise the differences between them and what they collectively don’t do.

Ideally we need a system that:

  • Is based on a deep understanding of how people, places, things, and information, relate to each other in the real world and how those relationships can change over time.
  • Has an internal model that allows these complex relationships to be recorded, i.e. it 'knows' how things relate to each other in general.
  • Can be extended when new things and relationships need to be recorded.
  • Records knowledge from multiple sources in a central place that allows analysts to see and use each other's knowledge for some specific domain.
  • Can share data more easily with other systems and organisations.
  • Can show how knowledge was derived, i.e. its direct or indirect links to source intelligence and applied general knowledge, such as what was inferred rather than stated explicitly in intelligence.
  • Can apply security and policy constraints.
  • Allows knowledge to be viewed, searched, summarised, and visualised in meaningful ways - graphically as well as textually.
  • Records who entered what and when, who viewed it or updated it, and when they viewed it or updated it.

Contents

  1. Data Integration
  2. Introducing Magma Core
  3. Implementation Considerations

* This is primarily to do with representing all the data within a system through a consistent approach to the model to which the data in Magma Core complies. While this has a bearing on the selection of the formatting used it is the underlying patterns that are of prime concern. Most data formats incur a loss if the goal is to comply with the theory on which Magma Core is based. Care has been taken within Magma Core to address this point but there is still room for improvement. See the Issues List for an overview of the topics that have been identified as being possible improvements. Suggestions for additional issues to be discussed and worked on by those using Magma Core are welcome.