Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Security Questions #241

Closed
quidity opened this issue Dec 22, 2021 · 11 comments · Fixed by #251
Closed

General Security Questions #241

quidity opened this issue Dec 22, 2021 · 11 comments · Fixed by #251
Labels
security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response.

Comments

@quidity
Copy link

quidity commented Dec 22, 2021

I'm a security reviewer from Google Chrome and have some general questions - they almost certainly have good answers and it's very possible I missed finding them in the spec. I also have some specific suggestions which I will open as distinct issues shortly.

  1. I think I disagree with the answer given in 2.9 in https://github.com/webmachinelearning/webnn/blob/main/security-privacy.md. This is introducing a new scripting language, built up in js then executed in a number of different interpreters (“cpu”, “gpu” etc.). The complexity of the operations is such that the likelihood of errors that ultimately result in out of bounds accesses is high, and malicious sites will have significant control over operations. This could be addressed more directly in the security/privacy explainer.
  2. Operations such as split/slice/squeeze that change the shape of tensors mid-calculation may lead to incorrect assumptions in later operations - for instance if eliding bounds checks this could lead to out of bounds accesses. It would be good for their to be operation level metadata that might be consumed by implementors to help prevent such problems.
  3. The universe of operations is likely to vary in future - how will consumers discover which operations are available (short of enumerating them through failures to instantiate)? How will operations be deprecated (for instance if they turn out to be badly implemented?)
  4. It feels like .build and .compute should be asynchronous in all cases?
  5. New side channels will be made available from shared resources (cpu/gpu). Timeable things should be out of process so incur at least some ipc to achieve anything. Probably not a massive worry when compared with already sharing a cpu between processes running renderers.
  6. Verify: Sites must delegate permission to host/run models.
  7. Verify: No serialization or caching yet - although this is likely in future.
  8. Control over how a model is run - (selecting cpu/gpu/tpu say) - is this too much power for the consuming site - it will for instance make it possible to more directly target a flawed implementation. It's not clear why this is required.

Sub issues:

@anssiko anssiko added the security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response. label Dec 22, 2021
@anssiko
Copy link
Member

anssiko commented Dec 22, 2021

@quidity thank you for this security review, much appreciated. We've tentatively scheduled this topic for discussion on our next meeting 13 January 2022 - 15:00-16:00 UTC. I acknowledge this may not be a reasonable meeting time for you, so we'll follow up in this issue afterwards.

@anssiko
Copy link
Member

anssiko commented Jan 27, 2022

We discussed these security review questions on WebML WG Teleconference – 27 January 2022:
https://www.w3.org/2022/01/27-webmachinelearning-minutes.html#t01

The WG will follow up with clarifying questions and proposed changes. This security review is tracked as part of wide review #239 of the WebNN spec.

On behalf of the group, I want to thank @quidity and Chrome Security team for this security review!

@anssiko
Copy link
Member

anssiko commented Jan 31, 2022

@quidity, as shared earlier, the group discussed your Security Review feedback and here's a summary of the group's thinking and proposed next steps. If your think we're going to the right directions, we'd be happy to know that, and even more so if our proposed next steps need course correction!

The group also had some questions to better understand some part of the feedback, in bold below.

Thanks for your review once again!

  1. I think I disagree with the answer given in 2.9 in https://github.com/webmachinelearning/webnn/blob/main/security-privacy.md. This is introducing a new scripting language, built up in js then executed in a number of different interpreters (“cpu”, “gpu” etc.). The complexity of the operations is such that the likelihood of errors that ultimately result in out of bounds accesses is high, and malicious sites will have significant control over operations. This could be addressed more directly in the security/privacy explainer.

The group agreed with out-of-bounds access concern and proposed to address this in the test suite and also note OOB in the Security Considerations.

The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?

  1. Operations such as split/slice/squeeze that change the shape of tensors mid-calculation may lead to incorrect assumptions in later operations - for instance if eliding bounds checks this could lead to out of bounds accesses. It would be good for their to be operation level metadata that might be consumed by implementors to help prevent such problems.

The group will review each operator identifying OOB access issues and add appropriate guidance to implementers for mitigations.

The group was not sure what "operation level metadata" exactly means in this context?

  1. The universe of operations is likely to vary in future - how will consumers discover which operations are available (short of enumerating them through failures to instantiate)? How will operations be deprecated (for instance if they turn out to be badly implemented?)

The group proposed to document such design principles in the Security Considerations.

To make possible deprecation easier, the high-level functions can be decomposed into smaller operations as described in the explainer. That is, possibly problematic ops could be deprecated and replaced with a polyfill implementation (likely trading some performance for security).

  1. It feels like .build and .compute should be asynchronous in all cases?

This design consideration is being discussed in #230 and your feedback is taken into consideration.

  1. New side channels will be made available from shared resources (cpu/gpu). Timeable things should be out of process so incur at least some ipc to achieve anything. Probably not a massive worry when compared with already sharing a cpu between processes running renderers.

The group agrees this is an area worth a clarification, any relevant guidance to implementers to be documented in the Security Considerations.

The group would like to understand how running timeable things out of process works as a mitigation?

  1. Verify: Sites must delegate permission to host/run models.

Correct. The permissions policy integration in place means the top-level must delegate permissions for an iframe to run a model using this API.

  1. Verify: No serialization or caching yet - although this is likely in future.

Correct. No such API affordances exposed, out of scope for the spec. That said, the group is investigating if implementation-level caching warrants guidance to implementers.

  1. Control over how a model is run - (selecting cpu/gpu/tpu say) - is this too much power for the consuming site - it will for instance make it possible to more directly target a flawed implementation. It's not clear why this is required.

The device selection is a hint only (device selection left to the implementation) and there's no device enumeration API. This partially mitigates the concern.

Sub issues:

Please find the group's comments in these sub issues.

@anssiko
Copy link
Member

anssiko commented Feb 3, 2022

I started a draft PR to incorporate this feedback into the Security Considerations of the specification: #251

This PR, once reviewed and merged, will help tick the first box in the Security wide review we're tracking as part of #239

To tick the second Security wide review box, we'll explicitly request an "official" (used to be Web Security IG) Security wide review to get even more eye balls scrutinise the security aspects of the spec.

@anssiko
Copy link
Member

anssiko commented Feb 17, 2022

@quidity, we'd appreciate if you could help answer the questions the group had with respect to your review questions, highlighted in bold in #241 (comment), also copied below for your convenience:

The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?

The group was not sure what "operation level metadata" exactly means in this context?

(Would addressing #243 satisfy all "operation level metadata" requirements?)

The group would like to understand how running timeable things out of process works as a mitigation?

Thank you for your help in hardening the security of this API!

@quidity
Copy link
Author

quidity commented Feb 17, 2022

The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?

My security concern is that the network is compiled into a program that is entirely under an attackers control. While not as powerful as javascript it is likely powerful enough to make some exploits easier as a result. It argues for careful implementation, and avoiding introducing too much control of control-flow at the API side. Hopefully these concerns can be considered as new operations are introduced.

The group was not sure what "operation level metadata" exactly means in this context?

(Would addressing #243 satisfy all "operation level metadata" requirements?)

Yes, that would be a helpful step.

The group would like to understand how running timeable things out of process works as a mitigation?

This is a concern of lower importance - mainly adding an ipc step will confuse any accidental high precision timers. Realistically there is not a lot to be done here!

Thank you for your help in hardening the security of this API!

A pleasure, thanks for the detailed response.

@anssiko
Copy link
Member

anssiko commented Mar 3, 2022

PR #251 has been updated to address the following:

  • "Notion that the API introduces a new scripting language"
  • "Running timeable things out of process"

The remaining work from this issue #241 is to decide how to address:

Our last discussion https://www.w3.org/2022/02/24-webmachinelearning-minutes.html#t06 on that topic was inconclusive. I propose we track #243 separately and mark this "General Security Questions" issue resolved when PR #251 is reviewed and landed.

Thanks for your contributions everyone. We're making strong progress with this important security review topic.

@quidity feel free to chime in #243 to help us move to a direction aligned with your expectations.

@anssiko
Copy link
Member

anssiko commented Mar 24, 2022

I'm reopening this auto-closed issue to circle back to @quidity to confirm he's happy with the general direction security considerations are taking: https://www.w3.org/TR/webnn/#security (as of writing this gh-pages deploy is not yet complete and the hosted doc not updated yet, but that should be fixed soon)

Our next step is to request security part of wide review tracked in #239

@quidity
Copy link
Author

quidity commented Mar 24, 2022

Thanks - this extra thinking is helpful! I don't think there's more to do here.

@anssiko
Copy link
Member

anssiko commented Mar 24, 2022

@quidity thanks for the confirmation!

@anssiko
Copy link
Member

anssiko commented Mar 25, 2022

Our next step is to request security part of wide review tracked in #239

Fyi: Submitted security review request (part of wide review) w3c/security-request#22 and updated #239 accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants