Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collect dead artifacts #294

Closed
ravenac95 opened this issue Oct 4, 2023 · 4 comments
Closed

Garbage collect dead artifacts #294

ravenac95 opened this issue Oct 4, 2023 · 4 comments
Assignees
Labels
c:data Gathering data (e.g. indexing)

Comments

@ravenac95
Copy link
Member

Which area(s) are affected? (leave empty if unsure)

No response

To Reproduce

At some point the pull-requests job was creating new "repo" artifacts. It's very possible it created replicas GIT_REPOSITORY artifact types of what was already in the database because string comparisons weren't normalizing to lower case when we found those names in the responses from github. It's possible this wasn't the case but we should write a script to do clean up and fix any given relations that may need to be fixed.

The correct artifacts would be ones that are related a project. Also, there are cases were pull-requests return an unexpected artifact of a different repository for a given pull request. Those shouldn't pose a problem in this scenario.

Describe the Bug

See above.

Expected Behavior

Names for repos should match regardless of case. Same goes for things like addresses.

@ravenac95 ravenac95 self-assigned this Oct 5, 2023
@ravenac95
Copy link
Member Author

Currently running this to check

@ryscheng ryscheng added this to OSO Oct 5, 2023
@github-project-automation github-project-automation bot moved this to Backlog in OSO Oct 5, 2023
@ryscheng ryscheng added the c:data Gathering data (e.g. indexing) label Oct 6, 2023
@ryscheng ryscheng changed the title Clean up pull-requests artifacts Garbage collect dead artifacts Oct 9, 2023
@ryscheng
Copy link
Member

ryscheng commented Oct 9, 2023

In the future, we're going to index everything from GitHub/npm etc. And we should clean up invalid artifacts at that time.

In the medium term, since everything is seeded an oss-directory project, we can remove any artifacts without a project associated.

@ryscheng
Copy link
Member

@ravenac95 is this still relevant?

@github-project-automation github-project-automation bot moved this from Backlog to Done in OSO Mar 25, 2024
@ravenac95
Copy link
Member Author

OBE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:data Gathering data (e.g. indexing)
Projects
Archived in project
Development

No branches or pull requests

2 participants