Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: 根据定时任务执行耗时/机器成功失败数量等指标数据推断系统故障 #3227

Open
jsonwan opened this issue Sep 25, 2024 · 0 comments
Labels
backlog 需求初始状态,等待产品进行评估 kind/enhancement 功能改进特性

Comments

@jsonwan
Copy link
Collaborator

jsonwan commented Sep 25, 2024

机房故障时,大量Agent失联,会导致受影响的任务长时间不结束,会表现出“正在运行中的定时任务持续增长”、“定时任务执行耗时普遍增加”、“定时任务执行失败的机器数量增加”等现象,根据现象推断出底层系统可能处于故障状态,有助于用户及时感知故障并处理。

@jsonwan jsonwan added kind/enhancement 功能改进特性 backlog 需求初始状态,等待产品进行评估 labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog 需求初始状态,等待产品进行评估 kind/enhancement 功能改进特性
Projects
None yet
Development

No branches or pull requests

1 participant