Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sentrykube): introduce additional possibilities to override config #63

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
237b761
feat(sentrykube): introduce additional possibilities to override comm…
Litarnus Nov 20, 2024
2d5eff4
remove unused function
Litarnus Nov 20, 2024
1ad39eb
rename variable
Litarnus Nov 20, 2024
acdc150
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2024
8c1aecf
test: add tests for new overrides
Litarnus Nov 21, 2024
8292130
Merge remote-tracking branch 'origin/martinl/additional-config-overri…
Litarnus Nov 21, 2024
6581a8e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 21, 2024
6404cca
test: fix test fixture name
Litarnus Nov 21, 2024
4ae8efa
Merge remote-tracking branch 'origin/martinl/additional-config-overri…
Litarnus Nov 21, 2024
86ed1e8
test: fix test fixture name
Litarnus Nov 21, 2024
b997eab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 21, 2024
0718dbe
test: fix test fixture name
Litarnus Nov 21, 2024
a470810
Merge remote-tracking branch 'origin/martinl/additional-config-overri…
Litarnus Nov 21, 2024
fb0bb07
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 21, 2024
3b9f8d7
test: remove unused variable in tests
Litarnus Nov 21, 2024
ad9c539
docs: update documentation to reflect the proper order of merging
Litarnus Nov 21, 2024
832e046
ref: move logic outside function, add tests
Litarnus Nov 25, 2024
1033493
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 25, 2024
06f43fc
ref: update test
Litarnus Nov 25, 2024
f0b3551
ref: update test
Litarnus Nov 25, 2024
1dea511
ref: update type hint
Litarnus Nov 25, 2024
ac39e53
ref: import and update type hint
Litarnus Nov 25, 2024
6b44492
ref: add more typehints
Litarnus Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions libsentrykube/kube.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
get_service_values,
get_service_value_overrides,
get_tools_managed_service_value_overrides,
get_hierarchical_value_overrides,
)
from libsentrykube.utils import (
deep_merge_dict,
Expand Down Expand Up @@ -111,6 +112,13 @@ def _consolidate_variables(
overrides are managed manually and they can contain comments. Tools cannot
preserve comments.
4. overridden by the cluster file. Which is likely going to be replaced by 2 and 3.
5. overridden by creating a hierarchical structure. Adding an intermediate directory
in 'region_override' with a '_values.yaml' file allows to have a common config
between a set of regions. This is useful if regions in a service are different, but
subset of them are similar.
It works like point 1 and 2 but with an additional layer in between
6. overridden by a common '_values.yaml' within a region folder that has a shared config
between all clusters in the region. Values in point 2 will still override those values
Copy link
Contributor

@fpacifici fpacifici Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the order is right.

  1. Cluster file will be deprecated for service specific override, we are moving to region override only. It still exists because we are in the middle of this process. But if a service has region overrides there should be nothing in the cluster file for that service.
  2. managed override (point 3 above) has to be the last override file (assuming the cluster file goes away).
  3. It seems that the common _values.yaml file within a region (point 6) should be before point 2 (region override). If it is shaerd between clusetrs in the same region, the cluster specific file (these - https://github.com/getsentry/ops/tree/master/k8s/services/relay-pop/region_overrides/us) should be more specific than you point 6.
  4. your point 5 should be before 2 as well.

In summary This should be the order from more generic to more specific
1 The per service value file (this is not region or cluster specific. It is service specific)
5 common override between subset of regions of the same service
6 common override between clusters in the same region (more specific than 5)
3 cluster specific override
4 managed override, which is basically equivalent to 3 but it is a separate file as it is managed by tools and cannot contain comments.

4 the cluster file is still there for historic reasons, but if you made changes to any of the above the cluster file is supposed to be empty for the specific service.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comment. You're right that the order isn't correct, I misunderstood the existing comment and thought it's more of a "we have this kind of overrides" instead of them being the order from most to least specific. I will update it accordingly


TODO: write the minimum components of a yaml parser to remove step 3 and
patch the regional override preserving comments.
Expand All @@ -131,6 +139,11 @@ def _consolidate_variables(
)
deep_merge_dict(service_values, managed_values)

hierarchical_values = get_hierarchical_value_overrides(
service_name, customer_name, cluster_name, external
)
deep_merge_dict(service_values, hierarchical_values)

# Service data overrides from clusters/
customer_values, _ = get_service_data(
customer_name,
Expand Down
122 changes: 117 additions & 5 deletions libsentrykube/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from collections import OrderedDict
from libsentrykube.config import Config
from libsentrykube.customer import load_customer_data
from libsentrykube.utils import workspace_root
from libsentrykube.utils import workspace_root, deep_merge_dict

_services = OrderedDict()

Expand Down Expand Up @@ -100,9 +100,21 @@ def get_service_value_overrides(
external: bool = False,
) -> dict:
"""
For the given service, return the values specified in the corresponding _values.yaml.
Loads service configuration with regional and cluster-specific overrides.

If "external=True" is specified, treat the service name as the full service path.
This function implements a two-level configuration system where common regional settings
can be shared across all clusters, with cluster-specific overrides on top.

Directory Structure:
region_overrides/
└── {region_name}/
├── _values.yaml # Common settings for all clusters in this region
├── cluster1.yaml # Cluster-specific overrides
└── cluster2.yaml # Cluster-specific overrides

Override Precedence (highest to lowest):
1. {cluster_name}.yaml # Cluster-specific settings
2. _values.yaml # Common regional settings
"""
try:
service_override_file = (
Expand All @@ -112,9 +124,109 @@ def get_service_value_overrides(

with open(service_override_file, "rb") as f:
values = yaml.safe_load(f)

# Try to load the config only after {cluster_name}.yaml was successfully loaded because the common override
# only makes sense if there is a cluster config
common_override_values = get_common_regional_override(
service_name, region_name, external
)

if common_override_values:
deep_merge_dict(common_override_values, values)
return common_override_values

return values
except FileNotFoundError:
values = {}
return values
return {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these methods (get_service_values, get_service_value_overrides, etc.) loads a level of overrides only. Then these are layered in the right order in _consolidate_variables. This is intentional so that we separate the logic that loads one level of override from the one that defines the order and make the order apparent.

Please do not merge different levels of overrides in the same method and create a new one instead. The merging should only happen in _consolidate_variables only. If you find a lot of duplication feel free to factor that logic out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, those are valid points. I will split it up and move the logic


def get_common_regional_override(
service_name: str, region_name: str, external: bool = False
) -> dict:
"""
Helper function to load common regional configuration values.

Looks for a '_values.yaml' file in the region's override directory that contains
settings shared across all clusters in that region.
"""
try:
common_service_override_file = (
get_service_value_override_path(service_name, region_name, external)
/ "_values.yaml"
)

with open(common_service_override_file, "rb") as f:
common_override_values = yaml.safe_load(f)

return common_override_values
except FileNotFoundError:
return {}


def get_hierarchical_value_overrides(
service_name: str,
region_name: str,
cluster_name: str = "default",
external: bool = False,
) -> dict:
"""
Enables hierarchical configuration overrides with shared base values.

This function extends the standard region_overrides system by adding support for
shared base configurations. This helps reduce duplication across region-specific
configurations.

Directory Structure:
region_overrides/
└── common_shared_config/ # Arbitrary name for the shared config group
├── _values.yaml # Base values for this group
└── {region_name}/ # Region-specific overrides
└── {cluster_name}.yaml # Cluster-specific overrides
Comment on lines +191 to +196
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow the reasoning here.

  1. Which scenario do you want to use the hierarchical value overrides in? Is it about single tenant in order to have some common config for all single tenants vs config common to all saas regions? While I hate the idea that ST should be different I guess that's the world we live into. But if this is not targeting that scenario, could you please elaborate ?
  2. If you want to provide a value file that is common to a group of regions why do you have a cluster file here as well? I think in that case the standard cluster specific file (/region_override/my_cluster.yaml) should apply rather than having a separate one. In other word I would like this cross regions defaults to apply within the same hierarchy structure we use for all the other services rather than having two parallel hierarchies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which scenario do you want to use the hierarchical value overrides in? Is it about single tenant in order to have some common config for all single tenants vs config common to all saas regions? While I hate the idea that ST should be different I guess that's the world we live into. But if this is not targeting that scenario, could you please elaborate ?

Yes it's for single tenants. I agree that it's not great that the config diverges

If you want to provide a value file that is common to a group of regions why do you have a cluster file here as well? I think in that case the standard cluster specific file (/region_override/my_cluster.yaml) should apply rather than having a separate one. In other word I would like this cross regions defaults to apply within the same hierarchy structure we use for all the other services rather than having two parallel hierarchies.

Consider the following structure:

region_overrides/
├── customer1/
├── customer2/
├── customer3/
├── customer4/
└── customer5/

customer1 and customer2 have similar config and customer3, customer4 and customer5 have also similar config with enough differences that they can't share a common config with customer1 and customer2.
As far as I know there is no support for this type of structure which results in a lot of copied config in either the first two or the last 3 customer configs.
Even if we would use /region_override/my_cluster.yaml, we can still only share common configs for either of those groups and the other one needs to overwrite it per cluster file.
With my changes we could split it into the following structure where each group can have a single file with a common config:

region_overrides/
├── first-group/
│   ├── customer1/
│   └── customer2/
└── second-group/
    ├── customer3/
    ├── customer4/
    └── customer5/

Additionally to reducing code duplication we would also see differences between customers more easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine in having groups.
Though the override systems must enforce that a given region/cluster must be defined only once.
Either us/default.yaml is in a group or it is in the root region_override. It is critical we do not allow both at the same time.

Also please psot this PR in #team-sre-squad-monkeys as this will be an important change that impacts all services and those who manage all services may have opinions.


Override Precedence (highest to lowest):
1. region_name/cluster_name.yaml
2. common_shared_config/_values.yaml
3. Top-level configuration
"""
if external:
service_regions_path = workspace_root() / service_name
else:
service_regions_path = get_service_path(service_name)

service_regions_path = service_regions_path / "region_overrides"

if not service_regions_path.exists():
return {}

for override_group in service_regions_path.iterdir():
if not override_group.is_dir():
continue

try:
service_override_file = (
get_service_value_override_path(
service_name, override_group.name, external
)
/ "_values.yaml"
)

with open(service_override_file, "rb") as f:
base_values = yaml.safe_load(f)
except FileNotFoundError:
base_values = {}

region_path = f"{override_group.name}/{region_name}"
region_values = get_service_value_overrides(
service_name, region_path, cluster_name, external
)

if not region_values:
continue

deep_merge_dict(base_values, region_values)
return base_values

return {}


def get_tools_managed_service_value_overrides(
Expand Down
78 changes: 78 additions & 0 deletions libsentrykube/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@ def set_workspaceroot() -> Iterator[None]:
},
}

COMMON_SHARED_CONFIG = {
"config": {"foo": "123", "baz": "test", "settings": {"abc": 10, "def": "test"}}
}

CLUSTER_OVERRIDE_CONFIG = {"config": {"foo": "not-foo", "settings": {"abc": 20}}}


@pytest.fixture
def config_structure() -> Generator[str, None, None]:
Expand Down Expand Up @@ -114,6 +120,78 @@ def config_structure() -> Generator[str, None, None]:
yield temp_dir


@pytest.fixture
def hierarchical_override_structure() -> Generator[str, None, None]:
with tempfile.TemporaryDirectory() as temp_dir:
k8s = Path(temp_dir) / "k8s"
os.makedirs(k8s / "clusters")
clusters = k8s / "clusters"
with open(clusters / "cluster1.yaml", "w") as f:
f.write(safe_dump(CLUSTER_1))
with open(clusters / "cluster2.yaml", "w") as f:
f.write(safe_dump(CLUSTER_2))

service_dir = k8s / "services" / "my_service"
region_overrides = service_dir / "region_overrides"
os.makedirs(region_overrides)

os.makedirs(k8s / "services" / "another_service")
common_config = region_overrides / "common_shared_config"
os.makedirs(common_config)

with open(common_config / "_values.yaml", "w") as f:
f.write(safe_dump(COMMON_SHARED_CONFIG))

region_dir = common_config / "customer1"
os.makedirs(region_dir)

with open(region_dir / "cluster1.yaml", "w") as f:
f.write(safe_dump(CLUSTER_OVERRIDE_CONFIG))

os.makedirs(Path(temp_dir) / "cli_config")
with open(Path(temp_dir) / "cli_config" / "configuration.yaml", "w") as f:
f.write(safe_dump(CONFIGURATION))

os.environ["SENTRY_KUBE_CONFIG_FILE"] = str(
Path(temp_dir) / "cli_config" / "configuration.yaml"
)

yield temp_dir


@pytest.fixture
def regional_cluster_specific_override_structure() -> Generator[str, None, None]:
with tempfile.TemporaryDirectory() as temp_dir:
k8s = Path(temp_dir) / "k8s"
os.makedirs(k8s / "clusters")
clusters = k8s / "clusters"
with open(clusters / "cluster1.yaml", "w") as f:
f.write(safe_dump(CLUSTER_1))
with open(clusters / "cluster2.yaml", "w") as f:
f.write(safe_dump(CLUSTER_2))

service_dir = k8s / "services" / "my_service"
region_overrides = service_dir / "region_overrides"
os.makedirs(region_overrides)

os.makedirs(k8s / "services" / "another_service")

region_dir = region_overrides / "customer1"
os.makedirs(region_dir)

with open(region_dir / "_values.yaml", "w") as f:
f.write(safe_dump(COMMON_SHARED_CONFIG))

with open(region_dir / "cluster1.yaml", "w") as f:
f.write(safe_dump(CLUSTER_OVERRIDE_CONFIG))

os.makedirs(Path(temp_dir) / "cli_config")
with open(Path(temp_dir) / "cli_config" / "configuration.yaml", "w") as f:
f.write(safe_dump(CONFIGURATION))

yield temp_dir


@pytest.fixture
def initialized_config_structure(config_structure: str) -> Generator[str, None, None]:
directory = config_structure
Expand Down
37 changes: 37 additions & 0 deletions libsentrykube/tests/test_service.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from libsentrykube.context import init_cluster_context
import os
from libsentrykube.service import (
get_hierarchical_value_overrides,
get_service_data,
get_service_values,
get_service_value_overrides,
Expand Down Expand Up @@ -63,6 +64,10 @@
}
}

expected_hierarchical_and_regional_cluster_values = {
"config": {"foo": "not-foo", "baz": "test", "settings": {"abc": 20, "def": "test"}}
}


def test_get_service_values_not_external():
region = "saas"
Expand Down Expand Up @@ -186,3 +191,35 @@ def test_write_managed_file(config_structure) -> None:
assert path.is_file()

set_workspace_root_start(start_workspace_root)


def test_get_hierarchical_value_overrides(hierarchical_override_structure: str) -> None:
set_workspace_root_start(hierarchical_override_structure)
os.environ["SENTRY_KUBE_CONFIG_FILE"] = str(
workspace_root() / "cli_config/configuration.yaml"
)
init_cluster_context("customer1", "cluster1")

returned = get_hierarchical_value_overrides(
service_name="my_service", region_name="customer1", cluster_name="cluster1"
)

assert returned == expected_hierarchical_and_regional_cluster_values


def test_regional_cluster_value_overrides(
regional_cluster_specific_override_structure: str,
) -> None:
set_workspace_root_start(regional_cluster_specific_override_structure)
os.environ["SENTRY_KUBE_CONFIG_FILE"] = str(
workspace_root() / "cli_config/configuration.yaml"
)
init_cluster_context("customer1", "cluster1")

returned = get_service_value_overrides(
service_name="my_service",
region_name="customer1",
cluster_name="cluster1",
)

assert returned == expected_hierarchical_and_regional_cluster_values
Loading