-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(proposal): Build init Log
instead of magic "init" Step
#789
base: main
Are you sure you want to change the base?
Conversation
Log
instead of magic "init" Step
Log
instead of magic "init" Step
Target team revisited this proposal during our biweekly review. The team needs more time to dig into the details. Look for more conversations and feedback in the upcoming week. |
|
||
Sqlite sorts with `NULLS FIRST` by default, and Postgres sorts with `NULLS LAST` (see: [How Are NULLs Sorted by Default?](https://learnsql.com/blog/how-to-order-rows-with-nulls/)). | ||
So this query is inconsistent between databases. And, the service logs returned by this query are not sorted. It would be nice to have this method return the logs sorted in a consistent manner. | ||
I would like to see the Build Log, then Step Logs sorted by step_id, then Service Logs sorted by service_id. Any suggestions on how to do that with gorm? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if GORM supports multiple ORDER BY
statements? i.e.
err = e.client.
Table(constants.TableLog).
Where("build_id = ?", b.GetID()).
Order("step_id ASC").
Order("service_id ASC").
I could see a scenario where instead we have to use code like:
err = e.client.
Table(constants.TableLog).
Where("build_id = ?", b.GetID()).
Order("step_id ASC, service_id ASC").
Based off my brief reviews, I'm seeing docs that suggest both Postgres and Sqlite support this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://gorm.io/docs/query.html#Order
Yep it supports multiple order statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this? I believe this will sort the logs as [init, steps..., services...] and the sort order will be consistent for both sqlite and postgres.
err = e.client.
Table(constants.TableLog).
Where("build_id = ?", b.GetID()).
Order("(step_id IS NULL AND service_id IS NULL) DESC"). // the build init log
Order("step_id ASC NULLS LAST").
Order("service_id ASC").
If that proves problematic, then we could add a computed column:
type Log struct {
ID sql.NullInt64 `sql:"id"`
BuildID sql.NullInt64 `sql:"build_id"`
RepoID sql.NullInt64 `sql:"repo_id"`
ServiceID sql.NullInt64 `sql:"service_id"`
StepID sql.NullInt64 `sql:"step_id"`
Data []byte `sql:"data"`
IsInit sql.Bool `sql:"is_init" gorm:"->"`
}
CREATE TABLE
IF NOT EXISTS
logs (
id SERIAL PRIMARY KEY,
build_id INTEGER,
repo_id INTEGER,
service_id INTEGER,
step_id INTEGER,
data BYTEA,
is_init BOOLEAN GENERATED ALWAYS AS (step_id IS NULL AND service_id IS NULL) STORED,
UNIQUE(step_id),
UNIQUE(service_id)
);
which works in both postgres and sqlite. Postgres only supports STORED
, but in sqlite a STORED
column can't be added to an existing table, so the sqlite migration would have to add the column as VIRTUAL
instead.
That would lead to this query:
err = e.client.
Table(constants.TableLog).
Where("build_id = ?", b.GetID()).
Order("is_init DESC").
Order("step_id ASC NULLS LAST").
Order("service_id ASC").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further thought, I think we do need an is_init
boolean column. Instead of delegating that calculation to the database, we'll just calculate it when creating the objects, that way it will be accessible in log.Validate
.
So:
CREATE TABLE
IF NOT EXISTS
logs (
id SERIAL PRIMARY KEY,
build_id INTEGER,
repo_id INTEGER,
service_id INTEGER,
step_id INTEGER,
is_init BOOLEAN,
data BYTEA,
UNIQUE(step_id),
UNIQUE(service_id),
CHECK(is_init IS NOT FALSE),
UNIQUE(build_id, is_init) -- only one init log per build
);
Then log.Validate
will ensure that one of is_init
, step_id
, and service_id
are not NULL
.
And we'll only ever set is_init
to TRUE
if both step_id
and service_id
are NULL
. And, we never set is_init
to FALSE
to allow the composite unique constraint to ensure that each build has only one init log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I just asked someone on the DB team at work, and he said this should work:
CREATE TABLE
IF NOT EXISTS
logs (
id SERIAL PRIMARY KEY,
build_id INTEGER,
repo_id INTEGER,
service_id INTEGER,
step_id INTEGER,
data BYTEA,
UNIQUE(build_id, step_id, service_id)
);
Though, that might require a newer version of Postgres (as it might need NULLS NOT DISTINCT
.
I think we might be able to use a partial index as well (which I didn't know existed):
CREATE TABLE
IF NOT EXISTS
logs (
id SERIAL PRIMARY KEY,
build_id INTEGER, -- maybe add NOT NULL constraint
repo_id INTEGER,
service_id INTEGER,
step_id INTEGER,
data BYTEA,
UNIQUE(step_id),
UNIQUE(service_id)
);
CREATE UNIQUE INDEX
IF NOT EXISTS
logs_build_init_log
ON
logs (build_id)
WHERE
step_id IS NULL AND service_id IS NULL;
Also, when upgrading, we need to ensure that: | ||
- the queue is empty, or | ||
- all queued builds get re-compiled (or more particularly `item.Pipeline` which is a `types.pipeline.Build` and includes the injected init step) before execution in the worker starts. | ||
|
||
To make the worker backwards compatible, could we trigger the rebuild in `server.queue.redis.Pop()` after the item is created? | ||
Or perhaps we'll just need to include queued build migrations in the migration script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your company have a requirement that prevents you from temporarily bringing the server offline?
Asking because if you bring the API down, any running builds on the workers should fail rather quickly.
Then, you could upgrade both the server and workers at the same time and bring them back online.
Assuming you're running Docker containers, hopefully the amount of time to upgrade is relatively short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the proposal, I tried to think through any backwards compatibility gotchas.
No policy prevents me from bringing them down.
I generally strive for 0-downtime upgrades as we have developers across many time zones (India, EU, US) which means, for most DevOps services, downtime always impacts someone. I am running Vela in Kubernetes: so I do any migrations to add new tables/columns, and then do rolling restarts that gradually start pods with the new version and then stop the old pods once enough of the pods are stable. However, the Vela install is still fairly "PoC" stage, so just shutting it down will have very little impact.
So, I guess the upgrade procedure would be something like:
- scale server deployment down to 0
- purge the queue or wait till the workers have pulled all builds from the queue and failed
- scale worker deployment down to 0
- run database migrations
- run helm upgrade to update the deployments to the new version of Vela
I'm just not completely sure on how to purge the queue or make sure it is empty. So, the upgrade guide should have a note about that, even if it is not part of the migration script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I can help out with this. ideally whatever solution we go with should be backwards compatible, and i'd prefer to have old and new builds look similar to avoid confusion. i can mock up some ideas |
this pre-supposes that it will be merged once approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - approving a second time because my Trailer says I haven't reviewed yet? 🤷
@cognifloyd We're laying out the upcoming roadmap and wanted to take this into consideration. It's been a little time since we last discussed. Is this still something you want? If so, maybe recap where we're at. |
@cognifloyd We haven't heard anything on this and are rapidly approaching the v0.21 release. We'll move you to v0.22 for October. Please let us know if you think October is doable and/or if you want this pushed out further. |
@cognifloyd Just posting an update during our monthly review of public PRs. During our last Committers Sync you mentioned other top priorities so this didn't have your attention. We're tentatively looking at an early November release for 0.22 so let us know if you think this might fit in that release. If not, we always have 0.23. |
@cognifloyd With v0.22 rc1 available, the next opportunity for this enhancement will be v0.23 (January 2024). Please let us know if we should expect these updates in that release. |
Probably not. Still digging through a backlog of things I need to do. |
@cognifloyd It's been a couple months. Is this still a feature you're working on? If so, maybe share an updated timeline for when we should expect to see an update. |
Status:
This is a successor to:
InitStep
resource #771