-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Platforms connection in health-check #325
Conversation
pkg/health/types.go
Outdated
// but later not registered nothing will happen. | ||
var indicatorNames = [...]string{ | ||
StorageIndicatorName, | ||
"kubernetes" + PlatformIndicatorSuffix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SM must not care about different platform types in the code. These should be a configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't mention the indicator names here somehow then they cannot be configured via the configuration
Describe in the Approach section the meaning of |
Any documentation for this feature? |
This properties are part of different change which introduces async healthcheck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using separate health indicators for each platform type, it would be simpler to have one indicator for all platforms and in its details list the state of each platform together with its type.
Leave SM wrappers decide which platform status is fatal. This decision may not be based on the platform type.
@@ -17,10 +17,12 @@ | |||
package healthcheck | |||
|
|||
import ( | |||
"context" | |||
h "github.com/InVisionApp/go-health" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h
alias is too short for such a wide scope
@@ -56,7 +58,7 @@ func (c *controller) healthCheck(r *web.Request) (*web.Response, error) { | |||
return util.NewJSONResponse(status, healthResult) | |||
} | |||
|
|||
func (c *controller) aggregate(overallState map[string]h.State) *health.Health { | |||
func (c *controller) aggregate(ctx context.Context, overallState map[string]h.State) *health.Health { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"overall" suggests the state is already aggregated, better rename it
@@ -70,6 +72,9 @@ func (c *controller) aggregate(overallState map[string]h.State) *health.Health { | |||
details := make(map[string]interface{}) | |||
for name, state := range overallState { | |||
state.Status = convertStatus(state.Status) | |||
if strings.Contains(name, health.PlatformIndicatorSuffix) && !web.IsAuthorized(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid parsing strings as it is error prone, better store this data in a structured way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not remove the details for all the health indicators in case the request is not authorised?
@@ -70,6 +72,9 @@ func (c *controller) aggregate(overallState map[string]h.State) *health.Health { | |||
details := make(map[string]interface{}) | |||
for name, state := range overallState { | |||
state.Status = convertStatus(state.Status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remap the status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to adopt library’s status “ok” and “failed” this change should be propagated everywhere since all the components check UP to determine if it is started correctly.
@@ -70,6 +72,9 @@ func (c *controller) aggregate(overallState map[string]h.State) *health.Health { | |||
details := make(map[string]interface{}) | |||
for name, state := range overallState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why loop a second time over overallState
? can't we do it in one loop?
return nil, fmt.Errorf("could not fetch platforms health from storage: %v", err) | ||
} | ||
|
||
details := make(map[string]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map[string]*health.Health
details[platform.Name] = health.New().WithStatus(health.StatusUp) | ||
} else { | ||
details[platform.Name] = health.New().WithStatus(health.StatusDown).WithDetail("since", platform.LastActive) | ||
err = fmt.Errorf("there is inactive %s platforms", pi.platformType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better include the total number of inactive platforms, e.g. "there are %d inactive %s platforms"
// default settings again, but this defaults could be overridden only via application.yml, | ||
// env variables and pflags won't have any effect. If an indicator is specified in this list | ||
// but later not registered nothing will happen. | ||
var indicatorNames = [...]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about indicators outside Peripli? who will add them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will maintain separate list of indicator names and extend default settings with them while configuring their settings. Other approach is to export this array lets discuss which is better.
BEGIN; | ||
|
||
ALTER TABLE platforms ADD COLUMN active boolean NOT NULL DEFAULT '0'; | ||
ALTER TABLE platforms ADD COLUMN last_active TIMESTAMP NOT NULL DEFAULT '1970-01-01 00:00:00+00'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's strange, better use NULL is the platform was never active
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use NULL then in Platform struct we need to use sql.NullString instead of time.Time so rows scanning can be done properly for null values and deal with casting, which for me seems like unnecessary complication but i am okay with both solutions.
storage/postgres/notificator.go
Outdated
if err := c.updatePlatform(platform.ID, func(p *types.Platform) { | ||
p.Active = false | ||
p.LastActive = time.Now() | ||
}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if
is hard to read, break into multiple statements
Motivation
Connection with platforms should be included in the health-check so it is easy to figure out if any state is propagated to a platform.
Approach
A new indicator for all platforms registered in SM is introduced.
And health-check response for such a configuration looks like this: