Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to a "latest" version? #191

Open
dougiesquire opened this issue Aug 30, 2024 · 5 comments
Open

Default to a "latest" version? #191

dougiesquire opened this issue Aug 30, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@dougiesquire
Copy link
Collaborator

The ACCESS-NRI Intake catalog is versioned in the same way as the access-nri-intake package (see /g/data/xp65/public/apps/access-nri-intake-catalog).

The default version is equal to the version of access-nri-intake being used, though the version can be specified when the catalog is loaded, e.g.

import intake

cat = intake.cat.access_nri
cat.metadata["version"]

cat_another_version = intake.cat.access_nri(version="v0.1.0")
cat_another_version.metadata["version"]

This default behaviour ensures that the catalog version and the version of access-nri-intake being used are compatible. However, it also means that users with old versions of access-nri-intake installed won't use the most recent version of the catalog by default.

We should consider instead defaulting to a "latest" version of the catalog that symlinks to the most recent released version. To manage compatibility issues, we could have "latest" versions for each major release (e.g. latest_v1) and ensure we increment the major version when changes are introduced to access-nri-intake that break compatibility with old catalogs.

@dougiesquire dougiesquire added the enhancement New feature or request label Aug 30, 2024
@marc-white
Copy link
Collaborator

This sounds like an eminently sensible idea. I can think of a few ways to do this in practice:

  1. We could store a catalog version (separate to the access-nri-intake-catalog version) within the package. The package would then load the newest catalog with that catalog version by default (and catalogs would be created with the current catalog version). It could also auto-reject any attempt to load a catalog with an incompatible version number.
  2. We set up a convention where we only increment the major version number of access-nri-intake-catalog when we make a change that will invalidate previous catalogs, and select which catalogs to open based on that.
  3. The catalog itself could store which version of access-nri-intake-catalog it is known to be compatible with/was generated with. We could then have each version of access-nri-intake-catalog know which back-versions of itself that generate compatible catalogs.

There are a few questions around any implementation:

  • How many catalog versions currently exist? (Gadi is down, else I would just check for myself.) And of those, are there any that are currently incompatible with the up-to-date access-nri-intake-catalog?
  • What is the purpose of opening an out-of-date catalog? Will we ever hit the point where there's something removed from the catalog that people will want to have access to (and if that's the case, why would we pull it from the catalog)?

@dougiesquire
Copy link
Collaborator Author

We could store a catalog version (separate to the access-nri-intake-catalog version) within the package.

The default catalog version is defined here. As you can see it is set to the access-nri-intake version. This is done when the catalog is built.

As above, I propose changing the default catalog version to "latest", and have a version latest in /g/data/xp65/public/apps/access-nri-intake-catalog that is a symlink to the latest catalog version. This would mean that a user always gets the latest version of the catalog by default (regardless of the version of access-nri-intake they are using) but they could also open a specific version if they wanted to.

To answer your questions:

How many catalog versions currently exist? (Gadi is down, else I would just check for myself.) And of those, are there any that are currently incompatible with the up-to-date access-nri-intake-catalog?

$ ls -alh /g/data/xp65/public/apps/access-nri-intake-catalog
total 36K
drwxrwsr-x+ 9 ds0092 xp65_w 4.0K May  7 08:56 .
drwxr-sr-x  8 rb5533 xp65   4.0K Aug 28 15:07 ..
drwxrwsr-x+ 3 ds0092 xp65_w 4.0K Sep 29  2023 v0.0.10
drwxrwsr-x+ 3 ds0092 xp65_w 4.0K Jul 10  2023 v0.0.8
drwxrwsr-x+ 3 ds0092 xp65_w 4.0K Jul 20  2023 v0.0.9
drwxrwsr-x+ 3 ds0092 xp65_w 4.0K Nov 29  2023 v0.1.0
drwxrwsr-x+ 3 ds0092 xp65_w 4.0K Mar  4  2024 v0.1.1
drwxrwsr-x+ 3 ds0092 xp65_w 4.0K Mar 28 14:12 v0.1.2
drwxrwsr-x+ 3 ds0092 xp65_w 4.0K Aug 29 16:54 v0.1.3

but we could probably ditch the v0.0 ones as they were really test releases. I think all catalog versions should be compatible with the latest access-nri-intake

What is the purpose of opening an out-of-date catalog? Will we ever hit the point where there's something removed from the catalog that people will want to have access to (and if that's the case, why would we pull it from the catalog)?

Reproducibility is one reason. It means users can be confident they are opening the same data (even if new data has since been added to the catalog). Also, if there's an issue with the latest catalog version it's convenient to be able to quickly switch to an earlier, working version.

@marc-white
Copy link
Collaborator

We could store a catalog version (separate to the access-nri-intake-catalog version) within the package.

The default catalog version is defined here. As you can see it is set to the access-nri-intake version. This is done when the catalog is built.

See, this is why I shouldn't blast through suggestions quickly when I'm tired...

Do you envisage updating the code so it can intelligently work out if the symlink to 'latest' needs to be updated when a new catalog version is generated, or do you expect that to be a manual step that someone needs to jump into the filesystem to do?

@dougiesquire
Copy link
Collaborator Author

Do you envisage updating the code so it can intelligently work out if the symlink to 'latest' needs to be updated when a new catalog version is generated, or do you expect that to be a manual step that someone needs to jump into the filesystem to do?

We could just remove the old symlink and create the new one as the final step in the build process?

@marc-white
Copy link
Collaborator

Do you envisage updating the code so it can intelligently work out if the symlink to 'latest' needs to be updated when a new catalog version is generated, or do you expect that to be a manual step that someone needs to jump into the filesystem to do?

We could just remove the old symlink and create the new one as the final step in the build process?

That's viable, although you'd need to be certain you were always creating a new catalog version, and not repairing an older catalog (or providing an small update to an existing catalog that's behind the 'latest' for whatever reason).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants