Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1769/KEP-3570/KEP693: Adding Windows Kubelet Manager implementation details #4738

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jsturtevant
Copy link
Contributor

  • One-line PR description:
    Adding Windows Kubelet Manager implementation details
  • Issue link:

Related PR:
CRI changes: kubernetes/kubernetes#124285
Kubelet changes: kubernetes/kubernetes#125296

/sig node
/sig windows

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 18, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 18, 2024
Signed-off-by: James Sturtevant <[email protected]>
@jsturtevant
Copy link
Contributor Author

/cc @marosset @aravindhp @knabben @kiashok
For sig-windows

/cc @haircommander @mrunalp
For sig-node

keps/sig-node/3570-cpumanager/README.md Outdated Show resolved Hide resolved
keps/sig-node/3570-cpumanager/README.md Outdated Show resolved Hide resolved
keps/sig-node/1769-memory-manager/README.md Outdated Show resolved Hide resolved
Signed-off-by: James Sturtevant <[email protected]>
keps/sig-node/3570-cpumanager/README.md Outdated Show resolved Hide resolved
```

Since the Kubelet API's are looking for a distinct ProcessorId, the id will be calculated by:
`(group *64) + procesorid` resulting in unique process id's from `group 0` as `1-64` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does processor numbering start at 1 in Windows? Or is this just an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bitmap so doesn't really have a number persay. The plan is to translate each processor in the bitmap to a Unique ID that we can then translate back to a bitmap. I started to implement this and my logic might be off a bit. I will update the logic here once I get it working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the logic, I've also started the id's from zero to avoid having to add +1 and -1 everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better now. One suggestion I have is to clarify variable names. You have GROUP_AFFINITY.Mask in one spot and groupaffinity.Mask in another.

## Windows considerations

Topology manager is already enabled on Windows in order to support the device manager. The same configuration options
and PRR applies to Windows. The CPU manager and Memory Manager can independently be enabled to support advance configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is PRR in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the same Production Readiness Review answers, in the sense that these features can be set to None to disable if required to roll back do to errors. I updated to clarify

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 28, 2024
Signed-off-by: James Sturtevant <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 28, 2024
Comment on lines 787 to 789
In order to support multiple numa nodes and be able to apply the Numa affinity to job objects the container runtime will be expected to mimic
the behavior of [PROC_THREAD_ATTRIBUTE_PREFERRED_NODE](https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-updateprocthreadattribute)
by finding the associated CPU's for the Numa Nodes that are passed via the Cri API and setting the preferred affinity for the job object.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that setting PROC_THREAD_ATTRIBUTE_PREFERRED_NODE is exactly equivalent to setting the job object's affinity to the CPUs on that node? Or could there be underlying differences?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the wording here. We don't know for certain if there are underlying differences, but will use the same general idea to work around the single Numa node limitation of this API.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jsturtevant, marosset
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: James Sturtevant <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Needs Reviewer
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

8 participants