Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I patch and build the open-gpu-kernel-modules extension to support the arm64 LX2160a platform? #529

Open
asymingt opened this issue Nov 23, 2024 · 2 comments

Comments

@asymingt
Copy link

asymingt commented Nov 23, 2024

I've managed to setup a Talos cluster with both amd64 and arm64 worker nodes. I have no issues running amd64 GPU jobs using the nonfree / production nvidia driver extension. There have been some sharp edges, but all-in-all I've had a pretty clean experience along the way, even though my Kubernetes knowledge is limited. Thank you!

The arm64 node is a Honeycomb LX2k board that is based around the LX2160s SOM and this requires a patch to the open-gpu-kernel-modules to function. This patch appears not to have made it into their master branch, and so don't think it is present in either the LTS or the production variant of the the Talos published extensions. A related issue is given here showing the OSS modules working with this patch. Before switching to Talos I was running containerized GPU images on this platform in Ubuntu 22.04 on an Ampere card without issues.

I checked out this repo thinking I might be able to apply a patch to the driver build script, but on closer inspection it appears like this repo actully stitches together prebuilt and signed artifacts from container registry ghcr.io/siderolabs/nvidia-open-gpu-kernel-modules-*. Would it be possible to nudge me in the right direction to patch, build and sign my own OSS modules to produce an updated Talos extension, or is there an official process whereby Sidero Labs can supply a prebuilt image with the patch applied to get this platform supported by the drivers?

@asymingt asymingt changed the title How can I patch and build the open-gpu-kernel-modules extension to support eh arm64 LX2160 platform? How can I patch and build the open-gpu-kernel-modules extension to support the arm64 LX2160 platform? Nov 23, 2024
@asymingt asymingt changed the title How can I patch and build the open-gpu-kernel-modules extension to support the arm64 LX2160 platform? How can I patch and build the open-gpu-kernel-modules extension to support the arm64 LX2160a platform? Nov 23, 2024
@smira
Copy link
Member

smira commented Nov 25, 2024

The best way is to submit a PR to pkgs repository with the patch.

@asymingt
Copy link
Author

The best way is to submit a PR to pkgs repository with the patch.

Oh, thank you! I wasn't aware of the pkgs repo. I'll try my hand at patching and building there 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants