Skip to content

Object life cycles (Archive and Delete)

amplifi edited this page Jun 29, 2017 · 1 revision

Introduction

Entities maintained in the Cadasta platform database fall into three categories with respect to their life cycles. Most domain entities (parties, spatial units, relationships, resources, etc.) have the history of their creation, modification and possible deletion tracked using a comprehensive history mechanism. The organisations and projects that contain these domain entities do not have their history tracked in this way, but it is possible to archive organisations and projects to remove them from the view of most platform users. Archived projects and entities can be completely purged from the platform once their data has been exported. User accounts require a degree of special treatment to ensure the competing requirements of maintenance of the edit history of entities and the possibility for users to withdraw their permission for their details to be held within the platform.

Domain entities

Note What's said in this section isn't yet true. Eventually, the models for all domain entities will be fully bitemporal, meaning that the history of all additions, updates and deletions for all entities of these types will be recorded. The exact timeline for development of this functionality isn't yet decided, but in the interim, we will log all changes (probably using django-audit-log) so that we can later reconstruct a fully bitemporal database from the logged history.

"Domain entities" refers to parties, spatial units, relationships (of all types) and resources. The full history of all entities of each of these types is maintained -- the models representing these entities are all bitemporal. What this means is that "updating" and "deleting" entities of these types does not update or delete database table rows. Instead, they mark the current rows as no longer relevant and (for updates) create new rows to reflect the new state of the entities. This has implications for querying and maintaining database tables representing these entities, for the representation and maintenance of foreign key relationships between tables of this type, and also for the maintenance of a temporalised form of entity integrity. All of these considerations will be managed by a django-bitemporal layer, and code using these entities will call the usual Django ORM methods to create, update and delete entities.

The upshot of all this is that as far as a "normal" user of the platform API is concerned, domain entities can be created, updated and deleted as if the platform database was a simple non-temporal database. Simple queries against the platform database return the current state of knowledge of the current state of the world. The temporal and history features of the database are entirely transparent to a naive user. This means that the terms create, update and delete can be applied to domain entities without any real ambiguity. More sophisticated users of the platform API will be able to ask for a view of the database "as of" a particular date, either in terms of times "in the real world" (e.g. "Who owned this parcel on 1 June 2016?") or "in the database" (e.g. "How many parcels had we registered within this project by 10 May 2017?"). They will also be able to perform retrospective updates to data to correct errors (e.g. "We thought that Stephen Achebe's date of birth was 4 February 1982; in fact it is 4 February 1984.") and the bitemporal database layer will record both the updated data and the period during which we believed the erroneous data.

All of this makes it a little complicated to talk about the "life cycle" of domain entities. From the perspective of a naive user, the life cycle of a domain entity is:

  1. The entity is created.

  2. The entity is possibly updated, perhaps multiple times.

  3. The entity is at some point possibly deleted.

In reality, in terms of rows in the relevant database table, the life cycle is really:

  1. The entity is created: a database row is created for the entity and marked as valid for all time.

  2. The entity is possible updated, perhaps multiple times: on each update, the currently active database row for the entity is marked as no longer being active and a new row is created with the new entity attributes, marked as valid for all time.

  3. The entity is at some point possible deleted: on deletion, the currently active database row for the entity is marked as no longer being active -- at this point, there are no longer any active rows in the database for the entity.

By recording the timestamps at which each of these events occur, the full history of the entity's status in the database can be captured.

It probably makes sense to talk of updates and deletion of these types of entities as updates with history and deletion with history. From the perspective of a naive user, they are just updates and deletion, but the underlying history mechanism preserves all previous versions of entities, and these can be accessed using the "as of" queries described above.

Organisations and projects

The database representations of organisations and projects have the following characteristics:

  1. Organisation and project attributes (e.g. contact details, geographical extents, logo images, etc.) can be updated at will by users with sufficient privileges, and these updates are made as "simple updates" without recording history information.

  2. Ordinarily, organisations and projects cannot be deleted from the platform by any mechanism exposed to users via the platform web app. These organisational entities may only be deleted by Cadasta personnel by running management scripts directly on the Cadasta servers.

  3. Organisations and projects can be archived by users with sufficient privileges. This removes the organisation or project and all the domain entities contained by them from the view of "normal" users (users with sufficient privileges can elect to see archived projects). Archived projects or organisations may be unarchived, rendering them once again visible to "normal" users of the platform. Archived projects or organisations may subsequently be purged, i.e. completely removed from the platform, by submitting a request to Cadasta. Before purging projects, all data included in the project must be exported and provided to a responsible person in the project's managing organisation. Positive confirmation will be required from representatives of the organisation owning the data in question before projects are purged from the platform.

User accounts

For auditing purposes, we need to record the platform user responsible for each creation, update and deletion of platform database entities. This poses a problem if a user desires for their user account to be deleted from the platform (a possibility that we should allow on privacy grounds), since deleting the account would leave dangling references to non-existent user accounts in the audit history. Two approaches to resolving this problem are possible:

  1. Use ON DELETE SET NULL for all foreign keys referring to user accounts. If a user account referred to by an audit record is deleted, the corresponding foreign key reference will just be set to NULL. This approach has the disadvantage of folding all deleted users into one: there is no way of distinguishing that a set of updates, for example, were all made by one deleted user, and another set of updates were made by a different deleted user.

  2. When a user account is to be deleted, simply anonymise the account information: set any identifying information to null or empty values, and set the user name to something like #deleted_user_00012, where the # character is not ordinarily permitted in user names. Each deleted user account can then retain a unique identity. The foreign key relationships from audit records to user accounts are unchanged -- the user account at the end of the foreign key relationship is just anonymised if the account is deleted.

Clone this wiki locally