General Security Questions #241

quidity · 2021-12-22T10:02:40Z

I'm a security reviewer from Google Chrome and have some general questions - they almost certainly have good answers and it's very possible I missed finding them in the spec. I also have some specific suggestions which I will open as distinct issues shortly.

I think I disagree with the answer given in 2.9 in https://github.com/webmachinelearning/webnn/blob/main/security-privacy.md. This is introducing a new scripting language, built up in js then executed in a number of different interpreters (“cpu”, “gpu” etc.). The complexity of the operations is such that the likelihood of errors that ultimately result in out of bounds accesses is high, and malicious sites will have significant control over operations. This could be addressed more directly in the security/privacy explainer.
Operations such as split/slice/squeeze that change the shape of tensors mid-calculation may lead to incorrect assumptions in later operations - for instance if eliding bounds checks this could lead to out of bounds accesses. It would be good for their to be operation level metadata that might be consumed by implementors to help prevent such problems.
The universe of operations is likely to vary in future - how will consumers discover which operations are available (short of enumerating them through failures to instantiate)? How will operations be deprecated (for instance if they turn out to be badly implemented?)
It feels like .build and .compute should be asynchronous in all cases?
New side channels will be made available from shared resources (cpu/gpu). Timeable things should be out of process so incur at least some ipc to achieve anything. Probably not a massive worry when compared with already sharing a cpu between processes running renderers.
Verify: Sites must delegate permission to host/run models.
Verify: No serialization or caching yet - although this is likely in future.
Control over how a model is run - (selecting cpu/gpu/tpu say) - is this too much power for the consuming site - it will for instance make it possible to more directly target a flawed implementation. It's not clear why this is required.

Sub issues:

anssiko · 2021-12-22T11:10:04Z

@quidity thank you for this security review, much appreciated. We've tentatively scheduled this topic for discussion on our next meeting 13 January 2022 - 15:00-16:00 UTC. I acknowledge this may not be a reasonable meeting time for you, so we'll follow up in this issue afterwards.

anssiko · 2022-01-27T16:13:20Z

We discussed these security review questions on WebML WG Teleconference – 27 January 2022:
https://www.w3.org/2022/01/27-webmachinelearning-minutes.html#t01

The WG will follow up with clarifying questions and proposed changes. This security review is tracked as part of wide review #239 of the WebNN spec.

On behalf of the group, I want to thank @quidity and Chrome Security team for this security review!

anssiko · 2022-01-31T16:15:04Z

@quidity, as shared earlier, the group discussed your Security Review feedback and here's a summary of the group's thinking and proposed next steps. If your think we're going to the right directions, we'd be happy to know that, and even more so if our proposed next steps need course correction!

The group also had some questions to better understand some part of the feedback, in bold below.

Thanks for your review once again!

I think I disagree with the answer given in 2.9 in https://github.com/webmachinelearning/webnn/blob/main/security-privacy.md. This is introducing a new scripting language, built up in js then executed in a number of different interpreters (“cpu”, “gpu” etc.). The complexity of the operations is such that the likelihood of errors that ultimately result in out of bounds accesses is high, and malicious sites will have significant control over operations. This could be addressed more directly in the security/privacy explainer.

The group agreed with out-of-bounds access concern and proposed to address this in the test suite and also note OOB in the Security Considerations.

The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?

Operations such as split/slice/squeeze that change the shape of tensors mid-calculation may lead to incorrect assumptions in later operations - for instance if eliding bounds checks this could lead to out of bounds accesses. It would be good for their to be operation level metadata that might be consumed by implementors to help prevent such problems.

The group will review each operator identifying OOB access issues and add appropriate guidance to implementers for mitigations.

The group was not sure what "operation level metadata" exactly means in this context?

The universe of operations is likely to vary in future - how will consumers discover which operations are available (short of enumerating them through failures to instantiate)? How will operations be deprecated (for instance if they turn out to be badly implemented?)

The group proposed to document such design principles in the Security Considerations.

To make possible deprecation easier, the high-level functions can be decomposed into smaller operations as described in the explainer. That is, possibly problematic ops could be deprecated and replaced with a polyfill implementation (likely trading some performance for security).

It feels like .build and .compute should be asynchronous in all cases?

This design consideration is being discussed in #230 and your feedback is taken into consideration.

New side channels will be made available from shared resources (cpu/gpu). Timeable things should be out of process so incur at least some ipc to achieve anything. Probably not a massive worry when compared with already sharing a cpu between processes running renderers.

The group agrees this is an area worth a clarification, any relevant guidance to implementers to be documented in the Security Considerations.

The group would like to understand how running timeable things out of process works as a mitigation?

Verify: Sites must delegate permission to host/run models.

Correct. The permissions policy integration in place means the top-level must delegate permissions for an iframe to run a model using this API.

Verify: No serialization or caching yet - although this is likely in future.

Correct. No such API affordances exposed, out of scope for the spec. That said, the group is investigating if implementation-level caching warrants guidance to implementers.

Control over how a model is run - (selecting cpu/gpu/tpu say) - is this too much power for the consuming site - it will for instance make it possible to more directly target a flawed implementation. It's not clear why this is required.

The device selection is a hint only (device selection left to the implementation) and there's no device enumeration API. This partially mitigates the concern.

Sub issues:

Guidelines/philosophy for new operations, including security principles #242

op metadata that helps avoid implementation mistakes #243

A conformance suite with disallowed intra-op examples would be helpful for hardening. #244

Please find the group's comments in these sub issues.

Fix #241

anssiko · 2022-02-03T10:37:08Z

I started a draft PR to incorporate this feedback into the Security Considerations of the specification: #251

This PR, once reviewed and merged, will help tick the first box in the Security wide review we're tracking as part of #239

To tick the second Security wide review box, we'll explicitly request an "official" (used to be Web Security IG) Security wide review to get even more eye balls scrutinise the security aspects of the spec.

Fix #241

anssiko · 2022-02-17T11:45:15Z

@quidity, we'd appreciate if you could help answer the questions the group had with respect to your review questions, highlighted in bold in #241 (comment), also copied below for your convenience:

The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?

The group was not sure what "operation level metadata" exactly means in this context?

(Would addressing #243 satisfy all "operation level metadata" requirements?)

The group would like to understand how running timeable things out of process works as a mitigation?

Thank you for your help in hardening the security of this API!

quidity · 2022-02-17T18:06:35Z

The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?

My security concern is that the network is compiled into a program that is entirely under an attackers control. While not as powerful as javascript it is likely powerful enough to make some exploits easier as a result. It argues for careful implementation, and avoiding introducing too much control of control-flow at the API side. Hopefully these concerns can be considered as new operations are introduced.

The group was not sure what "operation level metadata" exactly means in this context?

(Would addressing #243 satisfy all "operation level metadata" requirements?)

Yes, that would be a helpful step.

The group would like to understand how running timeable things out of process works as a mitigation?

This is a concern of lower importance - mainly adding an ipc step will confuse any accidental high precision timers. Realistically there is not a lot to be done here!

Thank you for your help in hardening the security of this API!

A pleasure, thanks for the detailed response.

anssiko · 2022-03-03T09:43:24Z

PR #251 has been updated to address the following:

"Notion that the API introduces a new scripting language"
"Running timeable things out of process"

The remaining work from this issue #241 is to decide how to address:

op metadata that helps avoid implementation mistakes #243

Our last discussion https://www.w3.org/2022/02/24-webmachinelearning-minutes.html#t06 on that topic was inconclusive. I propose we track #243 separately and mark this "General Security Questions" issue resolved when PR #251 is reviewed and landed.

Thanks for your contributions everyone. We're making strong progress with this important security review topic.

@quidity feel free to chime in #243 to help us move to a direction aligned with your expectations.

anssiko · 2022-03-24T16:32:23Z

I'm reopening this auto-closed issue to circle back to @quidity to confirm he's happy with the general direction security considerations are taking: https://www.w3.org/TR/webnn/#security (as of writing this gh-pages deploy is not yet complete and the hosted doc not updated yet, but that should be fixed soon)

Our next step is to request security part of wide review tracked in #239

quidity · 2022-03-24T16:44:48Z

Thanks - this extra thinking is helpful! I don't think there's more to do here.

anssiko · 2022-03-24T17:06:37Z

@quidity thanks for the confirmation!

anssiko · 2022-03-25T08:41:08Z

Our next step is to request security part of wide review tracked in #239

Fyi: Submitted security review request (part of wide review) w3c/security-request#22 and updated #239 accordingly.

anssiko added the security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response. label Dec 22, 2021

w3cbot mentioned this issue Dec 22, 2021

General Security Questions w3c/security-review#136

Open

anssiko added a commit that referenced this issue Feb 3, 2022

Update Security Considerations per review feedback

5914d8f

Fix #241

anssiko mentioned this issue Feb 3, 2022

Update Security Considerations per review feedback #251

Merged

anssiko added a commit that referenced this issue Feb 17, 2022

Update Security Considerations per review feedback

7c32d48

Fix #241

anssiko closed this as completed in #251 Mar 24, 2022

anssiko mentioned this issue Mar 25, 2022

Web Neural Network API 2022-03-25 > 2022-06-30 w3c/security-request#22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General Security Questions #241

General Security Questions #241

quidity commented Dec 22, 2021 •

edited

Loading

anssiko commented Dec 22, 2021

anssiko commented Jan 27, 2022

anssiko commented Jan 31, 2022

anssiko commented Feb 3, 2022 •

edited

Loading

anssiko commented Feb 17, 2022

quidity commented Feb 17, 2022

anssiko commented Mar 3, 2022

anssiko commented Mar 24, 2022

quidity commented Mar 24, 2022

anssiko commented Mar 24, 2022

anssiko commented Mar 25, 2022

General Security Questions #241

General Security Questions #241

Comments

quidity commented Dec 22, 2021 • edited Loading

anssiko commented Dec 22, 2021

anssiko commented Jan 27, 2022

anssiko commented Jan 31, 2022

anssiko commented Feb 3, 2022 • edited Loading

anssiko commented Feb 17, 2022

quidity commented Feb 17, 2022

anssiko commented Mar 3, 2022

anssiko commented Mar 24, 2022

quidity commented Mar 24, 2022

anssiko commented Mar 24, 2022

anssiko commented Mar 25, 2022

quidity commented Dec 22, 2021 •

edited

Loading

anssiko commented Feb 3, 2022 •

edited

Loading