Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

LVM: join different regions into single volume group #1045

Open
pohly opened this issue Oct 25, 2021 · 8 comments
Open

LVM: join different regions into single volume group #1045

pohly opened this issue Oct 25, 2021 · 8 comments

Comments

@pohly
Copy link
Contributor

pohly commented Oct 25, 2021

It may be useful to combine all PMEM in a system into a single volume for apps which need one very large memory-mapped address range or managing more than one volume is too cumbersome.

This cannot be done across CPU sockets by ipmctl (https://docs.pmem.io/ipmctl-user-guide/provisioning/concepts, "Regions cannot be created across CPU sockets."). It can be done by LVM, which is how PMEM-CSI could support this. The downside is that region affinity gets lost.

@pohly
Copy link
Contributor Author

pohly commented Oct 25, 2021

I can see two ways of solving this (not mutually exclusive, both make sense depending on usage patterns):

  • add a parameter that influences how PMEM-CSI sets up volume group(s) when it starts on a node: one volume group per region (current behavior, default) vs. merge into one volume group
  • during node provisioning, set up volume group(s) and then let PMEM-CSI manage those which have a certain configurable tag

The first option may be simpler to use (just create regions) while the second may also be useful for other use cases (managing non-PMEM storage).

@tigerhu2008
Copy link

Hi Patrick, one customer is asking us whether PMem-CSI can support cross-region LVM, because they want a bigger volume. other customer may have the same requirement. I think this is a useful feature.

@pohly
Copy link
Contributor Author

pohly commented Nov 4, 2021

@tigerhu2008: which of the two proposed solutions will be easier to use for your customer?

@tigerhu2008
Copy link

Will check with customer.

I have a question about implementation, saying there are two CPUs, each has 1TB PMem, how will the resource be reported to resource manager? only 2TB, or 2 * 1TB, or both? When asks for a >1TB PMem, scheduler needs to know the available PMem capacity.

@pohly
Copy link
Contributor Author

pohly commented Nov 6, 2021

Once the two 1TB regions are joined via LVM into one volume group (regardless how that is set up), PMEM-CSI will manage that volume group, so the reported size will be 2TB.

Another, more complicated alternative would be to manage individual physical volumes (one per region), and create one volume group per Kubernetes PVC as desired. That would be a way to specify that a PVC may span different regions or must not span different regions (when NUMA is important).

@pohly
Copy link
Contributor Author

pohly commented Mar 10, 2022

I've been told that a future Linux kernel feature might support namespaces that cross regions.

@tigerhu2008
Copy link

tigerhu2008 commented Apr 7, 2022 via email

@pohly
Copy link
Contributor Author

pohly commented Apr 7, 2022

Sorry, I don't know if or when that enhanced Linux kernel support might get included.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants