-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait replication-sync returns valid exit-code while CH connection error #231
Conversation
638d891
to
788584f
Compare
788584f
to
1f7f848
Compare
except requests.exceptions.ConnectionError: | ||
logging.error("Connection error while running query.") | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we should exit here? Is reraise the exception + logging.error or raise new exception with custom msg not enough?
Also maybe for requests.exceptions.ConnectionError
add retries? Because we can have some short network issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we should exit here? Is reraise the exception + logging.error or raise new exception with custom msg not enough?
That is good point - what do you think about fix problem globally here - https://github.com/yandex/ch-tools/blob/main/ch_tools/chadmin/cli/chadmin_group.py#L54
Just adding raise
in except branch. I afraid it is initial problem. And not only for wait replication-sync
.
Also maybe for
requests.exceptions.ConnectionError
add retries? Because we can have some short network issues
It is already here - https://github.com/yandex/ch-tools/blob/main/ch_tools/common/clickhouse/client/clickhouse_client.py#L165
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need to review all such emergency exists, we got a lot of them:
./ch_tools/s3_credentials/main.py: sys.exit(1)
./ch_tools/s3_credentials/main.py: sys.exit(1)
./ch_tools/s3_credentials/main.py: sys.exit(0)
./ch_tools/monrun_checks/main.py: sys.exit(1)
./ch_tools/monrun_checks/main.py: sys.exit(1)
./ch_tools/chadmin/cli/wait_group.py: sys.exit(1)
./ch_tools/chadmin/cli/wait_group.py: sys.exit(1)
./ch_tools/chadmin/cli/wait_group.py: sys.exit(0)
./ch_tools/chadmin/cli/wait_group.py: sys.exit(1)
./ch_tools/chadmin/cli/wait_group.py: sys.exit(0)
./ch_tools/chadmin/cli/wait_group.py: sys.exit(1)
./ch_tools/chadmin/cli/zookeeper_group.py: sys.exit(1)
Let's deal with it in the separate pr
It is already here
Thanks, missed it.
@MedvedewEM could you look into? https://github.com/yandex/ch-tools/actions/runs/10849461150/job/30108748612 |
It is weird. Second retry was helpful. Lets watch for this potential flap in coming PRs. |
In case of stopped CH
list_table_replicas(ctx)
query returns also ConnectionErrors.Let's except these exceptions once outside for-loop.