Skip to content
This repository has been archived by the owner on Nov 23, 2023. It is now read-only.

Allow slashes in dataset titles and s3 urls #1928

Open
34 tasks
billgeo opened this issue Aug 10, 2022 · 0 comments
Open
34 tasks

Allow slashes in dataset titles and s3 urls #1928

billgeo opened this issue Aug 10, 2022 · 0 comments
Labels
user story Something valuable for the user

Comments

@billgeo
Copy link
Contributor

billgeo commented Aug 10, 2022

User Story

So that I can allow the end users of the data to browse and download groups of datasets easier with the S3 API, I want to be able to add slashes / to the dataset title and therefore the s3 prefix.

Acceptance Criteria

  • Given a dataset title with a slash / in it, when new dataset is created with this dataset title, then the dataset title is accepted and the dataset is created
  • Given a dataset is created with a slash in the title, when the data is copied to the geostore, then the files are copied to s3 objects with the slashes in the s3 prefix/key. e.g. The user could specify:

Dataset title: Aerial_Imagery/Auckland/2020_RGB_Survey_03
S3 Prefix: S3://linz-geostore/Aerial_Imagery/Auckland/2020_RGB_Survey_03/collection.json

Dataset title: Aerial_Imagery-Auckland-2020_RGB_Survey_03
S3 Prefix: S3://linz-geostore/Aerial_Imagery-Auckland-2020_RGB_Survey_03/collection.json

Additional context

Might want to check how pystac will update the catalog.
Might need to check for implications of any other place we store or use the dataset title in the geostore to see if it will cause problems

Discussion from data managers on how they want to name, organise and access their data (particularly in the aerial imagery area) highlights that we should consider adding slashes /and other useful characters . to the allowed characters for a dataset title and therefore it's s3 url/path.

S3 limitations of characters in S3 keys https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html

See confluence page here and slack discussion here

Tasks

  • ...
  • ...

Definition of Ready

  • This story is ready to work on
    • Independent (story is independent of all other tasks)
    • Negotiable (team can decide how to design and implement)
    • Valuable (from a user perspective)
    • Estimate value applied (agreed by team)
    • Small (so as to fit within an iteration)
    • Testable (in principle, even if there isn't a test for it yet)
    • Environments are ready to meet definition of done
    • Resources required to implement will be ready
    • Everyone understands and agrees with the tasks to complete the story
    • Release value (e.g. Iteration 3) applied
    • Sprint value (e.g. Aug 1 - Aug 15) applied

Definition of Done

  • This story is done:
    • Acceptance criteria completed
    • Automated tests are passing
    • Code is peer reviewed and pushed to master
    • Deployed successfully to test environment
    • Checked against
      CODING guidelines
    • Relevant new tasks are added to backlog and communicated to the team
    • Important decisions recorded in the issue ticket
    • Readme/Changelog/Diagrams are updated
    • Product Owner has approved acceptance criteria as complete
    • Meets non-functional requirements:
      • Scalability (data): Can scale to 300TB of data and 100,000,000 files and ability to
        increase 10% every year
      • Scability (users): Can scale to 100 concurrent users
      • Cost: Data can be stored at < 0.5 NZD per GB per year
      • Performance: A large dataset (500 GB and 50,000 files - e.g. Akl aerial imagery) can be
        validated, imported and stored within 24 hours
      • Accessibility: Can be used from LINZ networks and the public internet
      • Availability: System available 24 hours a day and 7 days a week, this does not include
        maintenance windows < 4 hours and does not include operational support
      • Recoverability: RPO of fully imported datasets < 4 hours, RTO of a single 3 TB dataset
        < 12 hours
@billgeo billgeo added the user story Something valuable for the user label Aug 10, 2022
@billgeo billgeo changed the title feat: slashes and fullstops allowed in dataset titles and paths feat: allow slashes and fullstops in dataset titles and s3 urls Aug 15, 2022
@billgeo billgeo changed the title feat: allow slashes and fullstops in dataset titles and s3 urls Allow slashes and fullstops in dataset titles and s3 urls Aug 15, 2022
@billgeo billgeo changed the title Allow slashes and fullstops in dataset titles and s3 urls Allow slashes and other characters in dataset titles and s3 urls Aug 17, 2022
@billgeo billgeo changed the title Allow slashes and other characters in dataset titles and s3 urls Allow slashes in dataset titles and s3 urls Aug 30, 2022
@billgeo billgeo added the needs refinement Needs to be discussed by the team label Aug 30, 2022
@billgeo billgeo removed the needs refinement Needs to be discussed by the team label Sep 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
user story Something valuable for the user
Development

No branches or pull requests

1 participant