All relevant information is stored in a PostgreSQL database.
The database is the single source of truth and contains a pronounced data model which ensures consistency of values and references inside the database. Thus the database is independent of validations provided by ORM. Constraints and triggers are in place to make sure the data is inherently consistent, regardless of how it is accessed.
Table "public.shards"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
name | citext | | not null |
qualifier | citext | | not null | ''::citext
description | text | | |
created_at | timestamp with time zone | | not null | now()
updated_at | timestamp with time zone | | not null | now()
categories | bigint[] | | not null | '{}'::bigint[]
archived_at | timestamp with time zone | | |
Check constraints:
"shards_name_check" CHECK (name ~ '^[A-Za-z0-9_\-.]{1,100}$'::text)
"shards_qualifier_check" CHECK (qualifier ~ '^[A-Za-z0-9_\-.]{0,100}$'::citext)
Shards are generally called by their name. The name is defined in shard.yml
and also
applied when used as a dependency both in dependencies:
mapping as well as the folder
name in ./lib
.
Because of shard's decentralized design, name clashes can't be ruled out.
Name clashes are often caused by mirrors of a shard's repository. This is usually not that problematic because they can be viewed as alternate repositories of the same shard.
Example: kemal
- Main repo location:
github:kemalcr/kemal
. This is considered the canonical repo. - Old repo location:
github:sdogruyol/kemal
- There are also development forks like
github:straight-shoota/kemal
They all reference the same shard. Mirrors could be considered as individual instances but unless they have separate releases, they are considered the same shard, just provided at a different location. When a mirror releases independently, it must be considered a fork and a separate shard.
Shardbox doesn't take care of this automatically. Mirrors need to be listed as mirrors belonging to the canonical repository in the catalog. Shardbox only every synchronizes releases from the canonical repository. To separate a mirror to form its own shard, it simply needs to be removed from the mirrors list.
There might also be unrelated shards which share the same name. This is obviously not ideal, but can't really be avoided when there is no centralized registry that assigns names.
This problem is approached as follows:
- Shards are generally identified by their name as specified in
shard.yml
(e.g.kemal
). - When there are multiple shards of the same name (related or unrelated) in the database, an
additional qualifier is used to tell them apart (e.g.
kemalcr
orstraight-shoota
). This qualifier is interpreted as#{qualifier}'s version of #{name}
- Qualifiers can be omitted when there is no ambiguity.
- Name and qualifier combined form a slug which could look like
kemal
(no qualifier) orkemal~straight-shoota
. - Avoids
/
as delimiter for easier use in HTTP routes and to distinguish from<org>/<project>
scheme typically used for source code hosting.
Shardbox automatically assigns qualifiers derived from the URL. Qualifiers are only assigned when a shard name has already been taken (first come – first serve). There is currently no mechanism for modifying qualifiers (but might be implemented in the future).
Table "public.repos"
Column | Type | Collation | Nullable | Default
----------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
shard_id | bigint | | |
resolver | repo_resolver | | not null |
url | citext | | not null |
role | repo_role | | not null | 'canonical'::repo_role
metadata | jsonb | | not null | '{}'::jsonb
sync_failed_at | timestamp with time zone | | |
synced_at | timestamp with time zone | | |
created_at | timestamp with time zone | | not null | now()
updated_at | timestamp with time zone | | not null | now()
Check constraints:
"repos_obsolete_role_shard_id_null" CHECK (role <> 'obsolete'::repo_role OR shard_id IS NULL)
"repos_resolvers_service_url" CHECK (NOT (resolver = ANY (ARRAY['github'::repo_resolver, 'gitlab'::repo_resolver, 'bitbucket'::repo_resolver])) OR url ~ '^[A-Za-z0-9_\-.]{1,100}/[A-Za-z0-9_\-.]{1,100}$'::citext AND url !~~ '%.git'::citext)
"repos_shard_id_null_role" CHECK (shard_id IS NOT NULL OR role = 'canonical'::repo_role OR role = 'obsolete'::repo_role)
NOTES:
role
specifies the role of this repo for the shard (defaults tocanonical
). Other values aremirror
,legacy
andobsolete
. Thus, multiple repositories can be linked to the same shard. This is important for example to preserve continuity when a repository is transferred to a different location (for examplegithub:sdogruyol/kemal
togithub:kemalcr/kemal
) and to map mirrors of the same shard.
Table "public.releases"
Column | Type | Collation | Nullable | Default
---------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | generated by default as identity
shard_id | bigint | | not null |
version | character varying | | not null |
revision_info | jsonb | | not null |
spec | jsonb | | not null |
position | integer | | not null |
latest | boolean | | |
released_at | timestamp with time zone | | not null |
yanked_at | timestamp with time zone | | |
created_at | timestamp with time zone | | not null | now()
updated_at | timestamp with time zone | | not null | now()
Check constraints:
"releases_latest_check" CHECK (latest <> false)
"releases_version_check" CHECK (version::text ~ '^[0-9]+(\.[0-9a-zA-Z]+)*(-[0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*)?$'::text OR version::text = 'HEAD'::text)
NOTES:
- Releases are bound to a shard (
shard_id
), not an individual repo because repo locations may change. We consider each shard to have a unique release history determined by the releases provided by the canonical repo. position
is a utility column used to sort versions because PostgreSQL doesn't provide proper comparison operator for version strings. There is asemver
extension, but it requires versions to follow SEMVER, which is not enforced by shards. So we need to implement sort order externally usingService::OrderReleases
. As a benefit, result sorting is simple integer comparison.- If a release has been deleted from the repo (i.e. the tag was removed) it is marked as
yanked
. This procedure needs refinement. Yanked releases should still be addressable. - When a tag is changed to point to a different commit, it is simply updated. This also needs refinement.
Table "public.dependencies"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
release_id | bigint | | not null |
name | citext | | not null |
repo_id | bigint | | |
spec | jsonb | | not null |
scope | dependency_scope | | not null |
created_at | timestamp with time zone | | not null | now()
updated_at | timestamp with time zone | | not null | now()
NOTES:
repo_id
points to the repo referenced as dependency. This is onlyNULL
if it cannot be resolved every (path
scheme). In all other cases,repo_id
points to a repository record, but might not be actually resolvable (when the repository is not available).- The dependent shard is available through joining
releases
onrelease_id
. - Scope is either
runtime
ordependency
.