-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API simplification: context types, context options, createContext() #302
Comments
This is considered a v2 feature per https://www.w3.org/2022/12/15-webmachinelearning-minutes.html |
Following the proposals from #322 , adapted the proposal. Web IDL: enum MLContextType {
"cpu", // script-controlled context
"webgpu", // managed by the user agent
// later other context types may be defined, even using multiple devices, e.g. "cpu+npu" etc.
// Note: in fact all these context types could be separate interface classes as well...
};
enum MLPowerPreference { // a hint
"default",
"low-power"
};
dictionary MLContextOptions { // not a hint
MLContextType contextType = "cpu";
MLPowerPreference powerPreference = "default";
GPUDevice? gpuDevice = null;
};
[SecureContext, Exposed=(Window, DedicatedWorker)]
interface ML {
Promise<MLContext> createContext(optional MLContextOptions options);
[Exposed=(DedicatedWorker)]
MLContext createContextSync(optional MLContextOptions options);
// Internal slots
// [[boolean managed]] // `true` if the user agent controls the context (not really needed)
// [[MLContextType contextType]]
// [[MLPowerPreference powerPreference]]
// [[implementation]] // perhaps "adapter" would be better
// further methods (and eventually properties) will follow
}; |
Related to #322. @zolkis I'm hesitated to fold Also note that from the implementation standpoint, a CPU-based and a GPU-based implementation are different in significant ways. Providing an impression at the API level that they are alike could be a bit misleading. |
@wchao1115 OK, that is fine. In the spec we anyway need to distinguish between GPU, CPU etc and write separate algorithms. So I can see why separate declarations make sense. It's a design choice and you presented a rationale, which can go logged in the explainer and then I can close this issue. One note about future extensions, though: with this design we'll need to add new objects for adapters for the various future accelerators. With the design proposed here we'd just extend |
Picking up from discussion in #322 Maybe a good way to reboot this discussion is enumerating use cases for specifying (or not specifying) devices:
IMHO, the last use case is the only one where something like the current MLDeviceType is plausible, but it's not a good fit as-is. I'm not sure how to express (4) well - pass the I'd prefer to remove the deviceType option now (as was proposed in #322), and consider additional enums/values (per 3) as use cases are confirmed, and explore how to express (4) separately, to avoid baking in the current design. |
@inexorabletash , thanks for revisiting this issue!
+1 to explore this use case. I know there are interests from platform vendors in scheduling the workloads across multiple devices.
Before we have alternative solution, I concern removing deviceType options would prevent us from testing and comparing CPU and GPU executions in current Chromium prototype. We may also want to prototype the NPU execution by extending deviceType with "npu" value in the near future. In #322, we also discussed whether to introduce "system" device type, that would allow use case (1) and "all" devices case of (4) . For other cases of (4), maybe we can introduce similar combinations of MLComputeUnits, something like "cpu_and_gpu", "cpu_and_npu"? Another idea is to change deviceType to bit values, then developers can combine them, for example |
Fair point. We may want to flag in the spec (with an
Now we just have to pick a neutral term. 😉 |
|
Thanks for explaining further @wchao1115 - to ensure I'm understanding, let me attempt to restate the proposal: if you want WebNN to use GPU, you need to pass a GPUDevice, and otherwise WebNN should avoid using a system's GPU; the alternative is surprising behavior. Re: (3) and "pairing up" an NPU with a programmable fallback device - again, thank you for walking through this. Our current understanding is that DML either uses GPU or NPU and doesn't contain a CPU fallback path. In contrast, TFLite and CoreML will fall back to CPU if a GPU or NPU delegate can't cover an op. How do envision a WebNN implementation with a DML backend supporting this fallback? For DML would the fallback become the UA's responsibility? (Also, please don't read my comments/questions as arguing for an alternative approach - just trying to understand!) |
Thank you @inexorabletash for your comment. Yes, re: #322. Setting the CPU fallback could be highly expensive when it involves breaking up a fully pipelined GPU resources and drawing down a sizeable amount of data from the GPU memory to the DDR system memory with lower memory bandwidth. This overhead could be worse than just running the whole graph on the CPU in many cases, especially those that pull the data across the memory bus. This cost is far less prominent in a system with unified memory architecture where all the memories are shared and equally accessible to all integrated processors. DML today doesn't support CPU fallback on its own and still relying on the framework to do so, although it is technically possible through a WARP device. For WebNN, there is a dedicated CPU backend as we know, in addition to the DML backend. The pairing between an NPU and an integrated processor, whether it's an integrated GPU or a CPU should avoid the high cost of pipeline switching and memory readback while still providing a decent runtime experience to the users when fallback is needed. |
BTW, @RafaelCintron just gave me some relevant bits of info that WebGPU currently still lacks support for discrete GPU. I think it brings up a good point about the inherent risk of taking a dependency on an external spec i.e. there is a chance that #322 could be a logically correct but physically wrong design when we consider that any limitation or constraint on the GPU device lifetime on the WebGPU side will also impose the same limitation on WebNN. That is an argument for keeping the current design of having the "gpu" device type and not taking up #322. We could also add a new device type "npu" along with a new fallback device type with the supported values of { "gpu", "none" }, where "none" would mean "no fallback needed" i.e. compile just fail. |
Lifted from #298 for brevity.
Proposal
Provide a single context type as a mapped descriptor for the combination of resources used in the context, e.g. a valid combination of device(s). (Somewhat analogous to the adapter plus device(s) concept in Web GPU.)
Rationale for change
createContext()
,Context-based graph execution methods for different threading models. #257,
Define graph execution methods used in different threading models. #255,
Add support for device selection #162.
Related to #303.
The text was updated successfully, but these errors were encountered: