Skip to content

Commit

Permalink
consistent use of FederatedEGA & CentralEGA
Browse files Browse the repository at this point in the history
  • Loading branch information
blankdots committed Nov 28, 2023
1 parent ee72864 commit 228dac6
Show file tree
Hide file tree
Showing 8 changed files with 37 additions and 36 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/new-question.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: Support Issue
about: Ask for support on running and/or developing LocalEGA
about: Ask for support on running and/or developing FederatedEGA
labels: Support

---
Expand Down
30 changes: 15 additions & 15 deletions docs/connection.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
Interfacing with CEGA ⇌ SDA
===========================

All Local EGA instances are connected to Central EGA using
All `FederatedEGA` instances are connected to `CentralEGA` using
[RabbitMQ](http://www.rabbitmq.com), a Message Broker, that allows the
components to send and receive messages, which are queued, not lost, and
resent on network failure or connection problems.

The RabbitMQ message brokers of each SDA instance are the **only**
components with the necessary credentials to connect to Central EGA
components with the necessary credentials to connect to CentralEGA
message broker.

We call `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes know as `sda-mq`),
the RabbitMQ message brokers of, respectively, `Central EGA` and `SDA`/`LocalEGA`.
the RabbitMQ message brokers of, respectively, `CentralEGA` and `SDA`/`FederatedEGA`.

Local Message Broker
--------------------
Expand Down Expand Up @@ -49,7 +49,7 @@ Variable | Description
> would need to be set up to send and recive messages between other
> services.
Central EGA connection
CentralEGA connection
----------------------

`CEGAMQ` declares a `vhost` for each SDA instance. It also creates the
Expand Down Expand Up @@ -102,7 +102,7 @@ Service will wait for messages to arrive.

> NOTE:
> More information can be found also at
> [localEGA](https://localega.readthedocs.io/en/latest/amqp.html#message-interface-api-cega-connect-lega).
> [localEGA repository](https://localega.readthedocs.io/en/latest/amqp.html#message-interface-api-cega-connect-lega) - repository that provides functionality for `FederatedEGA` use case.
`CEGAMQ` receives notifications from `LocalMQ` using a *shovel*.
Everything that is published to its `to_cega` exchange gets forwarded to
Expand All @@ -118,30 +118,30 @@ files.inbox | For inbox file operations
files.verified | For files ready to request accessionID

Note that we do not need at the moment a queue to store the completed
message, nor the errors, as we forward them to Central EGA.
message, nor the errors, as we forward them to `CentralEGA`.

![RabbitMQ setup](./static/CEGA-LEGA.png)

Connecting SDA to Central EGA
Connecting SDA to CentralEGA
-----------------------------

Central EGA only has to prepare a user/password pair along with a
`CentralEGA` only has to prepare a user/password pair along with a
`vhost` in their RabbitMQ.

When Central EGA has communicated these details to the given Local EGA
instance, the latter can contact Central EGA using the federated queue
When `CentralEGA` has communicated these details to the given `FederatedEGA`
instance, the latter can contact `CentralEGA` using the federated queue
and the shovel mechanism in their local broker.

CentralEGA should then see 2 incoming connections from that new LocalEGA
`CentralEGA` should then see 2 incoming connections from that new `FederatedEGA`
instance, on the given `vhost`.

The exchanges and routing keys will be the same as all the other
LocalEGA instances, since the clustering is done per `vhost`.
FederatedEGA instances, since the clustering is done per `vhost`.

### Message Format

It is necessary to agree on the format of the messages exchanged between
Central EGA and any Local EGAs. Central EGA's messages are
`CentralEGA` and any `FederatedEGA`s. `CentralEGA`'s messages are
JSON-formatted.

The JSON schemas can be found in:
Expand Down Expand Up @@ -200,14 +200,14 @@ of messages:
- `type=cancel`: an ingestion cancellation
- `type=accession`: contains an accession id
- `type=mapping`: contains a dataset to accession ids mapping
- `type=heartbeat`: A mean to check if the Local EGA instance is
- `type=heartbeat`: A mean to check if the `FederatedEGA` instance is
"alive"

> IMPORTANT:
> The `encrypted_checksums` key is optional. If the key is not present the
> sha256 checksum will be calculated by `Ingest` service.
The message received from Central EGA to start ingestion at a Federated EGA node.
The message received from `CentralEGA` to start ingestion at a Federated EGA node.
Processed by the the `ingest` service.

```javascript
Expand Down
1 change: 1 addition & 0 deletions docs/dictionary/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ egas
endcoordinate
envs
exportrequests
federatedega
fega
fileid
filepath
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/deploy-k8s.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ This chart deploys a pre-configured database ([PostgreSQL](https://www.postgresq

### sda-mq - RabbitMQ component for Sensitive Data Archive (SDA) installation

This chart deploys a pre-configured message broker ([RabbitMQ](https://www.rabbitmq.com/)) designed to work [European Genome-Phenome Archive](https://ega-archive.org/) federated messaging interface between Central EGA and Local/Federated EGAs.
This chart deploys a pre-configured message broker ([RabbitMQ](https://www.rabbitmq.com/)) designed to work [European Genome-Phenome Archive](https://ega-archive.org/) federated messaging interface between `CentralEGA` and Local/Federated EGAs.

### sda-svc - Components for Sensitive Data Archive (SDA) installation

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Next step is to make sure that the remote connections (CEGA RabbitMQ) are workin

## End-to-end testing

NOTE: This guide assumes that there exists a test instance account with Central EGA. Make sure that the account is approved and added to the submitters group.
NOTE: This guide assumes that there exists a test instance account with `CentralEGA`. Make sure that the account is approved and added to the submitters group.

### Upload file(s)

Expand Down
22 changes: 11 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ NeIC Sensitive Data Archive

The NeIC Sensitive Data Archive (SDA) is an encrypted data archive, implemented for storage of sensitive data. It is implemented as a modular microservice system that can be deployed in different configurations depending on the service needs.

The modular architecture of SDA supports both stand alone deployment of an archive, and the use case of deploying a Federated node in the [Federated European Genome-phenome Archive network (FEGA)](https://ega-archive.org/federated), serving discoverable sensitive datasets in the main [EGA web portal](https://ega-archive.org).
The modular architecture of SDA supports both stand alone deployment of an archive, and the use case of deploying a Federated node in the [Federated European Genome-phenome Archive network](https://ega-archive.org/federated), serving discoverable sensitive datasets in the main [EGA web portal](https://ega-archive.org).

> NOTE:
> Throughout this documentation, we can refer to [Central
Expand Down Expand Up @@ -32,7 +32,7 @@ Service/component | Description | Archive sub-process
-------:|:------------|:-----------------------------
Database | A Postgres database with appropriate schema, stores the file header, the accession id, file path and checksums as well as other relevant information. | Submission, Ingestion and Data Retrieval
MQ | A RabbitMQ message broker with appropriate accounts, exchanges, queues and bindings. We use a federated queue to get messages from CentralEGA's broker and shovels to send answers back.| Submission and Ingestion
Inbox | Upload service for incoming data, acting as a dropbox. Uses credentials from Central EGA. | Submission
Inbox | Upload service for incoming data, acting as a dropbox. Uses credentials from `CentralEGA`. | Submission
Intercept | Relays messages between the queue provided from the federated service and local queues. | Submission and Ingestion
[Ingest](services/ingest.md) | Splits the Crypt4GH header and moves it to the database. The remainder of the file is sent to the storage backend (archive). No cryptographic tasks are done. | Ingestion
[Verify](services/verify.md) | Using the archive crypt4gh secret key, this service can decrypt the stored files and checksum them against the embedded checksum for the unencrypted file. | Ingestion
Expand Down Expand Up @@ -62,32 +62,32 @@ This operations handbook is organized in four main parts, that each has it's ow
The overall data workflow consists of three parts:

- The users logs onto the Local EGA's inbox and uploads the encrypted
files. They then go to the Central EGA's interface to prepare a
- The users logs onto the `FederatedEGA`'s inbox and uploads the encrypted
files. They then go to the `CentralEGA`'s interface to prepare a
submission;
- Upon submission completion, the files are ingested into the archive
and become searchable by the Central EGA's engine;
and become searchable by the `CentralEGA`'s engine;
- Once the file has been successfully archived, it can be accessed by
researchers in accordance with permissions given by the
corresponding Data Access Committee.

------------------------------------------------------------------------

Central EGA contains a database of users with permissions to upload to a
specific Sensitive Data Archive. The Central EGA ID is used to
`CentralEGA` contains a database of users with permissions to upload to a
specific Sensitive Data Archive. The `CentralEGA` ID is used to
authenticate the user against either their EGA password or a private
key.

For every uploaded file, Central EGA receives a notification that the
For every uploaded file, `CentralEGA` receives a notification that the
file is present in a SDA's inbox. The uploaded file must be encrypted
in the [Crypt4GH file format](https://samtools.github.io/hts-specs/crypt4gh.pdf) using that SDA public Crypt4gh key. The file is
checksumed and presented in the Central EGA's interface in order for
checksumed and presented in the `CentralEGA`'s interface in order for
the user to double-check that it was properly uploaded.

More details about process in [Data Submission](submission.md#data-submission).

When a submission is ready, Central EGA triggers an ingestion process on
the user-chosen SDA instance. Central EGA's interface is updated with
When a submission is ready, `CentralEGA` triggers an ingestion process on
the user-chosen SDA instance. `CentralEGA`'s interface is updated with
progress notifications whether the ingestion was successful, or whether
there was an error.

Expand Down
6 changes: 3 additions & 3 deletions docs/structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ Deployment related choices

### Federated vs stand-alone

In a Federated setup, the Local EGA archive node setup locally need to exchange status updates with the Central EGA in a synchronized manner to basically orchestrate two parallel processes
In a Federated setup, the `FederatedEGA` archive node setup locally need to exchange status updates with the `CentralEGA` in a synchronized manner to basically orchestrate two parallel processes

1. The multi-step process of uploading and safely archiving encrypted files holding both sensitive phenome and genome data.

2. The process of the Submitter annotating the archived data in an online portal at Central EGA, resulting in assigned accession numbers for items such as DataSet, Study, Files etc.
2. The process of the Submitter annotating the archived data in an online portal at `CentralEGA`, resulting in assigned accession numbers for items such as DataSet, Study, Files etc.


In a stand-alone setup, the deployed service has less remote synchronisation to worry about, but on the other hand need more components to also handle annotations/meta-data locally, as well as to deal with identifiers etc.
Expand Down Expand Up @@ -66,7 +66,7 @@ Additional components

### Authentication of users

In a Federated setup, a data submitter will usually be required to have a user profile with the Central EGA services as well as a user identity trusted by the Federated EGA node services.The [Life Science AAI](https://lifescience-ri.eu/) login identity is primarily used (a.k.a. ELIXIR AAI identity) for the latter. Integration towards both authentication services will likely need to be incorporated into a Federated EGA nodes upload mechanism and download mechanism.
In a Federated setup, a data submitter will usually be required to have a user profile with the `CentralEGA` services as well as a user identity trusted by the Federated EGA node services.The [Life Science AAI](https://lifescience-ri.eu/) login identity is primarily used (a.k.a. ELIXIR AAI identity) for the latter. Integration towards both authentication services will likely need to be incorporated into a Federated EGA nodes upload mechanism and download mechanism.

### Authorizing access to datasets

Expand Down
8 changes: 4 additions & 4 deletions docs/submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Data Submission
Ingestion Procedure
-------------------

For a given LocalEGA, Central EGA selects the associated `vhost` and
For a given `FederatedEGA` node, `CentralEGA` selects the associated `vhost` and
drops, in the `files` queue, one message per file to ingest.

Structure of the message and its contents are described in
Expand Down Expand Up @@ -55,21 +55,21 @@ that the integrated checksum is valid.

At this stage, the associated decryption key is retrieved. If decryption
completes and the checksum is valid, a message of completion is sent to
Central EGA: Ingestion completed.
`CentralEGA`: Ingestion completed.

> **Important**
> If a file disappears or is overwritten in the inbox before ingestion is completed, ingestion may not be possible.
If any of the above steps generates an error, we exit the workflow and
log the error. In case the error is related to a misuse from the user,
such as submitting the wrong checksum or tampering with the encrypted
file, the error is forwarded to Central EGA in order to be displayed in
file, the error is forwarded to `CentralEGA` in order to be displayed in
the Submission Interface.

Submission Inbox
----------------

Central EGA contains a database of users, with IDs and passwords. We
`CentralEGA` contains a database of users, with IDs and passwords. We
have developed several solutions allowing user authentication against
CentralEGA user database:

Expand Down

0 comments on commit 228dac6

Please sign in to comment.