Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote output health improvements #4185

Open
michel-laterman opened this issue Dec 6, 2024 · 1 comment
Open

Remote output health improvements #4185

michel-laterman opened this issue Dec 6, 2024 · 1 comment
Labels
enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@michel-laterman
Copy link
Contributor

Describe the enhancement:

Currently remote output health is reported (when updateState is called) in the policy-self monitor:

func reportOutputHealth(ctx context.Context, bulker bulk.Bulk, zlog zerolog.Logger) {
//pinging logic
bulkerMap := bulker.GetBulkerMap()

This creates a document in the primary ES instance with the output health status:

func CreateOutputHealth(ctx context.Context, bulker bulk.Bulk, doc model.OutputHealth) error {
return createOutputHealth(ctx, bulker, FleetOutputHealth, doc)
}
func createOutputHealth(ctx context.Context, bulker bulk.Bulk, index string, doc model.OutputHealth) error {
if doc.Timestamp == "" {
doc.Timestamp = time.Now().UTC().Format(time.RFC3339)
}
doc.DataStream = &model.DataStream{
Dataset: "fleet_server.output_health",
Type: "logs",
Namespace: "default",
}
body, err := json.Marshal(doc)
if err != nil {
return err
}
id, err := uuid.NewV4()
if err != nil {
return err
}
_, err = bulker.Create(ctx, index, id.String(), body, bulk.WithRefresh())
return err
}
.

However policy self monitor may not be a good place to have these updates as the output bulker health signal is not actually used by the monitor.
Additionally gathering a reference to all bulkers may cause some concurrency issues as seen in #4170.

We may want to have remote bulkers start a heartbeat goroutine that would use the primary bulker to write their status directly; This would address both issues.

@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Dec 6, 2024
@cmacknz
Copy link
Member

cmacknz commented Dec 6, 2024

We may want to have remote bulkers start a heartbeat goroutine that would use the primary bulker to write their status directly; This would address both issues.

This is also the first alternative I thought of when I first saw what the code was doing. I don't think we'd have to worry about the number of goroutines, because there aren't going to be 1000s of remote outputs unless there is some crazy bug somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

2 participants