Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry install event on certain errors #229

Draft
wants to merge 50 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
96b5d9a
Add docstring for attributes.
yhaliaw Jan 17, 2024
11f7b11
Refactor configuration to charm state
yhaliaw Jan 17, 2024
cd1e656
Merge branch 'main' into refactor/config-charm-state
yhaliaw Jan 18, 2024
bd05772
Merge branch 'main' into refactor/config-charm-state
yhaliaw Jan 18, 2024
a33580e
Fix linting
yhaliaw Jan 18, 2024
1e4b837
Merge branch 'main' into refactor/config-charm-state
yhaliaw Jan 18, 2024
6841425
Fix bugs
yhaliaw Jan 19, 2024
0a87ba1
[no ci] fmt
yhaliaw Jan 19, 2024
948a4ed
[no ci] Fix test_charm_state unit tests
yhaliaw Jan 22, 2024
d1f0f4b
Fix naming conflict
yhaliaw Jan 23, 2024
58d4427
WIP
yhaliaw Jan 23, 2024
bb1c35b
Merge branch 'main' into refactor/config-charm-state
yhaliaw Jan 29, 2024
e3589de
Fix merge of event timer
yhaliaw Jan 29, 2024
1bdc973
Fix linting issues
yhaliaw Jan 29, 2024
90dd4b9
Fix unit test for charm.py
yhaliaw Jan 29, 2024
060f740
Merge branch 'main' into refactor/config-charm-state
yhaliaw Jan 30, 2024
7ce1cbe
Fix merge problems
yhaliaw Jan 31, 2024
131c214
Merge branch 'main' into refactor/config-charm-state
yhaliaw Jan 31, 2024
44a39a9
Fix firewall unit test
yhaliaw Jan 31, 2024
56c003a
Remove factory.py
yhaliaw Jan 31, 2024
b241824
Merge branch 'main' into refactor/config-charm-state
yhaliaw Feb 1, 2024
292fa0c
Fix import error from merge with main
yhaliaw Feb 1, 2024
6c7d404
Fix unit test issue from merge with main
yhaliaw Feb 1, 2024
953c3e8
Merge branch 'main' into refactor/config-charm-state
yhaliaw Feb 19, 2024
39210b6
Fix merge
yhaliaw Feb 19, 2024
381f0f5
Fix licenses
yhaliaw Feb 19, 2024
9428325
Update error message
yhaliaw Feb 19, 2024
48f4044
Fix unit test and lints
yhaliaw Feb 19, 2024
f101cef
Merge branch 'main' into refactor/config-charm-state
yhaliaw Feb 20, 2024
9ca2fd7
Fix issue from merge
yhaliaw Feb 20, 2024
a11f67f
Merge branch 'main' into refactor/config-charm-state
yhaliaw Feb 21, 2024
b1ea589
Fix one integration test
yhaliaw Feb 21, 2024
e52fc83
Fix integration test
yhaliaw Feb 21, 2024
dfa3da4
Increase the timeout of token change test
yhaliaw Feb 21, 2024
72dffed
Refactor according to feedback
yhaliaw Feb 22, 2024
5f3b667
Fix import
yhaliaw Feb 22, 2024
c46562f
[no ci] Apply suggestions from code review
yhaliaw Feb 22, 2024
7ee3d51
Add reasoning of reconcile_interval validation
yhaliaw Feb 22, 2024
59d400e
Fix lint and unit test
yhaliaw Feb 23, 2024
0c4275f
Fix unit test
yhaliaw Feb 23, 2024
948d3c4
Revert CharmState in RunnerManagerConfig
yhaliaw Feb 23, 2024
e6b9b71
[no ci] Apply suggestions from code review
yhaliaw Feb 24, 2024
e79f5f1
Fix unit test typing
yhaliaw Feb 24, 2024
6b2a3af
Fix usage of denylist charm config
yhaliaw Feb 24, 2024
ae1d99f
Enter error state to retry install event
yhaliaw Feb 24, 2024
802859b
Add comment on why start event does not need retry
yhaliaw Feb 24, 2024
5edd9a3
Merge branch 'main' into bug/install-event-retry
yhaliaw Feb 27, 2024
987a985
Error handling in install and upgrade hook
yhaliaw Feb 27, 2024
05a8cc1
Merge branch 'main' into bug/install-event-retry
yhaliaw Mar 1, 2024
922c7f7
Fix rename of variable
yhaliaw Mar 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions src-docs/charm.py.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Charm for creating and managing GitHub self-hosted runner instances.

---

<a href="../src/charm.py#L77"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm.py#L79"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>function</kbd> `catch_charm_errors`

Expand All @@ -38,7 +38,7 @@ Catch common errors in charm.

---

<a href="../src/charm.py#L112"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm.py#L120"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>function</kbd> `catch_action_errors`

Expand Down Expand Up @@ -67,7 +67,7 @@ Catch common errors in actions.
## <kbd>class</kbd> `GithubRunnerCharm`
Charm for managing GitHub self-hosted runners.

<a href="../src/charm.py#L157"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm.py#L165"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>function</kbd> `__init__`

Expand Down
47 changes: 40 additions & 7 deletions src-docs/charm_state.py.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ The charm state.

---

<a href="../src/charm_state.py#L573"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm_state.py#L605"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>classmethod</kbd> `from_charm`

Expand Down Expand Up @@ -252,6 +252,39 @@ Return a string representing the path.
Path to the GitHub entity.


---

## <kbd>class</kbd> `HttpProxyEnvVar`
HTTP proxy environment variables.

Validation is not performed. Intend for propagating the environment variable as is to services used by the charm.

Used for access the proxy env vars where the state should not be accessed. The state can throw errors based on configuration. Code that should be performed regardless of configuration issues should use this class to access the proxy, e.g., install dependencies.



**Attributes:**

- <b>`http_proxy`</b>: HTTP proxy environment variable.
- <b>`https_proxy`</b>: HTTPS proxy environment variable.
- <b>`no_proxy`</b>: No proxy environment variable.




---

<a href="../src/charm_state.py#L408"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>classmethod</kbd> `from_env`

```python
from_env() → HttpProxyEnvVar
```

Initialize the HTTP proxy environment variable from env.


---

## <kbd>class</kbd> `ProxyConfig`
Expand All @@ -261,8 +294,8 @@ Proxy configuration.

**Attributes:**

- <b>`http`</b>: HTTP proxy address.
- <b>`https`</b>: HTTPS proxy address.
- <b>`http_proxy`</b>: HTTP proxy address.
- <b>`https_proxy`</b>: HTTPS proxy address.
- <b>`no_proxy`</b>: Comma-separated list of hosts that should not be proxied.
- <b>`use_aproxy`</b>: Whether aproxy should be used for the runners.

Expand All @@ -277,7 +310,7 @@ Return the aproxy address.

---

<a href="../src/charm_state.py#L440"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm_state.py#L472"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>classmethod</kbd> `check_fields`

Expand All @@ -300,7 +333,7 @@ Validate the proxy configuration.

---

<a href="../src/charm_state.py#L402"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm_state.py#L433"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>classmethod</kbd> `from_charm`

Expand Down Expand Up @@ -413,7 +446,7 @@ SSH connection information for debug workflow.

---

<a href="../src/charm_state.py#L520"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm_state.py#L552"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>classmethod</kbd> `from_charm`

Expand Down Expand Up @@ -441,7 +474,7 @@ Raised when given machine charm architecture is unsupported.

- <b>`arch`</b>: The current machine architecture.

<a href="../src/charm_state.py#L477"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/charm_state.py#L509"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>function</kbd> `__init__`

Expand Down
9 changes: 9 additions & 0 deletions src-docs/errors.py.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ Errors used by the charm.



---

## <kbd>class</kbd> `CharmInstallError`
Represents failure in charm installation.





---

## <kbd>class</kbd> `ConfigurationError`
Expand Down
60 changes: 32 additions & 28 deletions src/charm.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@
CharmConfigInvalidError,
CharmState,
GithubPath,
HttpProxyEnvVar,
ProxyConfig,
RunnerStorage,
VirtualMachineResources,
parse_github_path,
)
from errors import (
CharmInstallError,
ConfigurationError,
LogrotateSetupError,
MissingRunnerBinaryError,
Expand Down Expand Up @@ -90,6 +92,11 @@ def func_with_catch_errors(self, event: EventT) -> None:
func(self, event)
except ConfigurationError as err:
logger.exception("Issue with charm configuration")

if isinstance(event, InstallEvent):
# Enter error state to let juju retry the install event.
raise

self.unit.status = BlockedStatus(str(err))
except TokenError as err:
logger.exception("Issue with GitHub token")
Expand All @@ -105,6 +112,7 @@ def func_with_catch_errors(self, event: EventT) -> None:
self.unit.status = BlockedStatus(
"Unauthorized OpenStack connection. Check credentials."
)
# Intentionally not catching `CharmInstallError` to let juju retry install related issues.

return func_with_catch_errors

Expand Down Expand Up @@ -342,24 +350,23 @@ def _on_install(self, _event: InstallEvent) -> None:
Args:
event: Event of installing the charm.
"""
state = self._setup_state()

self.unit.status = MaintenanceStatus("Installing packages")
try:
# The `_start_services`, `_install_deps` includes retry.
self._install_deps()
self._start_services(state.charm_config.token, state.proxy_config)
metrics.setup_logrotate()
except (LogrotateSetupError, SubprocessError) as err:
logger.exception(err)
if isinstance(err, LogrotateSetupError):
msg = "Failed to setup logrotate"
else:
msg = "Failed to install dependencies"
self.unit.status = BlockedStatus(msg)
return
# Retry install by going into error state.
raise CharmInstallError(msg) from err

self._refresh_firewall(state)
# Above this line, retry are done by entering error state.
# Below this line, retry is performed by other events, e.g., config_change.

state = self._setup_state()
runner_manager = self._get_runner_manager(state)

self.unit.status = MaintenanceStatus("Building runner image")
Expand Down Expand Up @@ -391,6 +398,8 @@ def _on_start(self, _event: StartEvent) -> None:
Args:
event: Event of starting the charm.
"""
# Start event does not require retry on failure currently as it only perform the same
# actions as the reconcile event. Therefore the reconcile event serves as a retry.
state = self._setup_state()
runner_manager = self._get_runner_manager(state)

Expand Down Expand Up @@ -457,13 +466,9 @@ def _on_upgrade_charm(self, _event: UpgradeCharmEvent) -> None:
Args:
event: Event of charm upgrade.
"""
state = self._setup_state()

logger.info("Reinstalling dependencies...")
try:
# The `_start_services`, `_install_deps` includes retry.
self._install_deps()
self._start_services(state.charm_config.token, state.proxy_config)
metrics.setup_logrotate()
except (LogrotateSetupError, SubprocessError) as err:
logger.exception(err)
Expand All @@ -472,12 +477,14 @@ def _on_upgrade_charm(self, _event: UpgradeCharmEvent) -> None:
msg = "Failed to setup logrotate"
else:
msg = "Failed to install dependencies"
self.unit.status = BlockedStatus(msg)
return
# Retry the install by going into error state.
raise CharmInstallError(msg) from err

# Above this line, retry are done by entering error state.
# Below this line, retry is performed by other events, e.g., config_change.

state = self._setup_state()
self._refresh_firewall(state)
logger.info("Flushing the runners...")
state = self._setup_state()
runner_manager = self._get_runner_manager(state)
if not runner_manager:
return
Expand All @@ -498,10 +505,12 @@ def _on_config_changed(self, _event: ConfigChangedEvent) -> None:
"""
state = self._setup_state()
self._set_reconcile_timer()
self._refresh_firewall(state)

prev_config_for_flush: dict[str, str] = {}
if state.charm_config.token != self._stored.token:
prev_config_for_flush["token"] = str(self._stored.token)
# The runners should be flushed at some point after `_start_services` is called.
self._start_services(state.charm_config.token, state.proxy_config)
self._stored.token = None
if self.config["path"] != self._stored.path:
Expand All @@ -516,9 +525,6 @@ def _on_config_changed(self, _event: ConfigChangedEvent) -> None:
# it may be the case that the prev token has expired, so we need to use force flush
prev_runner_manager.flush(FlushMode.FORCE_FLUSH_WAIT_REPO_CHECK)

state = self._setup_state()
self._refresh_firewall(state)

if state.charm_config.openstack_clouds_yaml:
# Test out openstack integration and then go
# into BlockedStatus as it is not supported yet
Expand Down Expand Up @@ -585,7 +591,7 @@ def _check_and_update_dependencies(
runner_bin_updated = True

self.unit.status = MaintenanceStatus("Checking for service updates")
service_updated = self._install_repo_policy_compliance(proxy_config)
service_updated = self._install_repo_policy_compliance(HttpProxyEnvVar.from_env())

if service_updated or runner_bin_updated:
logger.info(
Expand Down Expand Up @@ -762,25 +768,25 @@ def _reconcile_runners(
self.unit.status = ActiveStatus()
return {"delta": {"virtual-machines": delta_virtual_machines}}

def _install_repo_policy_compliance(self, proxy_config: ProxyConfig) -> bool:
"""Install latest version of repo_policy_compliance service.
def _install_repo_policy_compliance(self, proxies: HttpProxyEnvVar) -> bool:
"""Install or upgrade to latest version of repo_policy_compliance service.

Args:
proxy_config: Proxy configuration.
proxies: HTTP proxy environment variables.

Returns:
Whether version install is changed. Going from not installed to
installed will return True.
"""
# Prepare environment variables for pip subprocess
env = {}
if http_proxy := proxy_config.http:
if http_proxy := proxies.http_proxy:
env["HTTP_PROXY"] = http_proxy
env["http_proxy"] = http_proxy
if https_proxy := proxy_config.https:
if https_proxy := proxies.https_proxy:
env["HTTPS_PROXY"] = https_proxy
env["https_proxy"] = https_proxy
if no_proxy := proxy_config.no_proxy:
if no_proxy := proxies.no_proxy:
env["NO_PROXY"] = no_proxy
env["no_proxy"] = no_proxy

Expand Down Expand Up @@ -828,15 +834,13 @@ def _enable_kernel_modules(self) -> None:
@retry(tries=5, delay=5, max_delay=60, backoff=2, local_logger=logger)
def _install_deps(self) -> None:
"""Install dependencies."""
state = self._setup_state()

logger.info("Installing charm dependencies.")
# Snap and Apt will use any proxies configured in the Juju model.
# Binding for snap, apt, and lxd init commands are not available so subprocess.run used.
# Install dependencies used by repo-policy-compliance and the firewall
self._apt_install(["gunicorn", "python3-pip", "nftables"])
# Install repo-policy-compliance package
self._install_repo_policy_compliance(state.proxy_config)
self._install_repo_policy_compliance(HttpProxyEnvVar.from_env())
execute_command(
["/usr/bin/apt-get", "remove", "-qy", "lxd", "lxd-client"], check_exit=False
)
Expand Down
Loading
Loading