Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Updating of artifacts and artifact data #83

Merged

Conversation

MorpheusXAUT
Copy link
Contributor

@MorpheusXAUT MorpheusXAUT commented Oct 5, 2022

TL;DR

This PR extends datacatalog to support the updating of artifacts and overwriting of stored artifact data as well as some minor cleanup/housekeeping.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

The new UpdateArtifact endpoints allows for an existing artifact to be updated. At the moment, only the underlying ArtifactData can be overwritten.
While updating, datacatalog will create any new data keys, overwrite any existing ones with the new data provided and remove no longer provided elements from the underlying blob storage. To facilitate a full cleanup, deletion of artifact data was implemented as well.

This PR also fixes the mock generation for the repository interfaces as they seem to have never been re-generated using the latest mockery version.
Additionally, the datacatalog project was moved to Go 1.18 and the Dockerfile setup was adapted to be in line with flyteadmin. Some minor cleanup/housekeeping was performed to make the linter happy.

As this uses a currently unmerged flyteidl version, the PR is created as a draft (for review) until a new version of flyteidl is available.

Tracking Issue

flyteorg/flyte#2867

Follow-up issue

NA

@codecov
Copy link

codecov bot commented Oct 5, 2022

Codecov Report

Merging #83 (995f5e3) into master (42c55c9) will increase coverage by 0.84%.
The diff coverage is 71.72%.

@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
+ Coverage   68.70%   69.54%   +0.84%     
==========================================
  Files          31       31              
  Lines        1262     1376     +114     
==========================================
+ Hits          867      957      +90     
- Misses        325      341      +16     
- Partials       70       78       +8     
Flag Coverage Δ
unittests 68.45% <70.07%> (+0.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/manager/impl/artifact_data_store.go 68.00% <50.00%> (-3.43%) ⬇️
pkg/manager/impl/artifact_manager.go 70.23% <68.57%> (+3.38%) ⬆️
pkg/repositories/gormimpl/artifact.go 84.37% <83.33%> (-0.63%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@welcome
Copy link

welcome bot commented Oct 5, 2022

Thank you for opening this pull request! 🙌
These tips will help get your PR across the finish line: - Most of the repos have a PR template; if not, fill it out to the best of your knowledge. - Sign off your commits (Reference: DCO Guide).

Nick Müller added 8 commits November 2, 2022 16:05
Currently only supports replacement of associated ArtifactData
Uses unreleased flyteidl update for new datacatalog endpoint

Signed-off-by: Nick Müller <[email protected]>
Updated Dockerfile to be in line with flyteadmin
Updated GitHub automation to use Go 1.18

Signed-off-by: Nick Müller <[email protected]>
@MorpheusXAUT MorpheusXAUT force-pushed the cache-eviction-update-artifact branch from 637a9dd to f3391f1 Compare November 3, 2022 13:48
@MorpheusXAUT MorpheusXAUT marked this pull request as ready for review November 3, 2022 13:52
Nick Müller added 2 commits November 4, 2022 15:26
Prevents DB entries without underlying blob storage data in case of a partial update failure

Signed-off-by: Nick Müller <[email protected]>
logger.Debugf(ctx, "Get artifact by id %v", request.GetArtifactId())
artifactKey := transformers.ToArtifactKey(datasetID, request.GetArtifactId())

key := queryHandle.GetArtifactId()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we changed this from using a switch over to len(queryHandle.GetArtifcatId()) from request.QueryHandle.(type). Was it required because of the addition of UpdateArtifactRequest handling?

Copy link
Contributor Author

@MorpheusXAUT MorpheusXAUT Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamersaw Yes, exactly. Unfortunately the type generated for the UpdateArtifactRequest is a datacatalog.UpdateArtifactRequest_ArtifactId (instead of a datacatalog.GetArtifactRequest_ArtifactId), however both implement GetArtifactId() string.
As the oneof interface is not exported in the generated protobuf definitions, we can't use a shared interface, so this new artifactQueryHandle interface was the easiest I came up with (which allows both to be passed as a single method parameter).

@hamersaw hamersaw changed the title Updating of artifacts and artifact data #minor Updating of artifacts and artifact data Nov 7, 2022
@hamersaw hamersaw merged commit b682495 into flyteorg:master Nov 8, 2022
@welcome
Copy link

welcome bot commented Nov 8, 2022

Congrats on merging your first pull request! 🎉

@MorpheusXAUT MorpheusXAUT deleted the cache-eviction-update-artifact branch November 8, 2022 15:55
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* Fixed repo interface mocks not being generated

Signed-off-by: Nick Müller <[email protected]>

* Implemented updating of artifacts
Currently only supports replacement of associated ArtifactData
Uses unreleased flyteidl update for new datacatalog endpoint

Signed-off-by: Nick Müller <[email protected]>

* Updated project to Go 1.18
Updated Dockerfile to be in line with flyteadmin
Updated GitHub automation to use Go 1.18

Signed-off-by: Nick Müller <[email protected]>

* Fixed artifact data upsert in Artifact.Update

Signed-off-by: Nick Müller <[email protected]>

* Removed golang_dockerfile from boilerplate update

Signed-off-by: Nick Müller <[email protected]>

* UpdateArtifact returns ID of artifact updated

Signed-off-by: Nick Müller <[email protected]>

* Updated to latest version of flytepropeller

Signed-off-by: Nick Müller <[email protected]>

* Updated to latest released version of flyteidl

Signed-off-by: Nick Müller <[email protected]>

* Updated to latest released flytestdlib version

Signed-off-by: Nick Müller <[email protected]>

* Missing artifact data is now removed after DB models have been updated
Prevents DB entries without underlying blob storage data in case of a partial update failure

Signed-off-by: Nick Müller <[email protected]>

Signed-off-by: Nick Müller <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants