Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add appProtocol to agent service to allow agent to work with istio #5240

Merged

Conversation

noahjax
Copy link
Contributor

@noahjax noahjax commented Apr 17, 2024

Why are the changes needed?

If you try to use an agent on a cluster with istio, flytepropeller is unable to connect to the agent. You will either need to remove the agent from flytepropeller's configmap or flytepropeller will fail on startup.

Connecting to flyteconsole is also broken in istio because missing appProtocol prevents some http ports from working correctly.

What changes were proposed in this pull request?

Specify an appProtocol that istio can use to determine how to proxy requests (see here for more details on how this works).

Update: I also modified the flyteadmin and flyteconsole services so that they will work with istio as well. More extensive changes are necessary to get everything working, but I will split those into a separate PR as they are likely more controversial

How was this patch tested?

Tested on my flyte deployment with istio enabled for the whole cluster.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@noahjax noahjax marked this pull request as ready for review April 17, 2024 00:07
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. enhancement New feature or request labels Apr 17, 2024
@ddl-ebrown
Copy link
Contributor

I think you just need to run the local tools to update the generated files and such:

HELM_SKIP_INSTALL=true make helm

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Apr 17, 2024
@@ -40,6 +40,7 @@ spec:
- name: agent-grpc
port: 8000
protocol: TCP
appProtocol: TCP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@noahjax noahjax marked this pull request as draft April 17, 2024 18:00
@noahjax noahjax marked this pull request as ready for review April 19, 2024 21:43
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Apr 19, 2024
@noahjax noahjax force-pushed the noahjax.add-appProtocol-to-agent branch from 14660e5 to bbcefcd Compare April 26, 2024 17:20
Copy link

codecov bot commented Apr 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.99%. Comparing base (37fe0a3) to head (8590993).
Report is 119 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5240   +/-   ##
=======================================
  Coverage   60.99%   60.99%           
=======================================
  Files         794      794           
  Lines       51475    51475           
=======================================
  Hits        31398    31398           
  Misses      17185    17185           
  Partials     2892     2892           
Flag Coverage Δ
unittests-datacatalog 69.31% <ø> (ø)
unittests-flyteadmin 58.73% <ø> (ø)
unittests-flytecopilot 17.79% <ø> (ø)
unittests-flytectl 68.03% <ø> (ø)
unittests-flyteidl 79.04% <ø> (ø)
unittests-flyteplugins 61.85% <ø> (ø)
unittests-flytepropeller 57.30% <ø> (ø)
unittests-flytestdlib 65.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pingsutw
pingsutw previously approved these changes May 6, 2024
@pingsutw
Copy link
Member

pingsutw commented May 6, 2024

@noahjax could you run make helm locally, and push it again?

@Future-Outlier
Copy link
Member

@noahjax could you run make helm locally, and push it again?

@noahjax Maybe we need to try make helm;make helm_update;

@ddl-ebrown
Copy link
Contributor

I'll take this one over too and try to update tomorrow. Thanks!

@pingsutw
Copy link
Member

@ddl-ebrown could you run make helm to resolve the test failure? thanks!

@ddl-ebrown ddl-ebrown force-pushed the noahjax.add-appProtocol-to-agent branch from bbcefcd to 70dc772 Compare June 25, 2024 21:56
 * flyteadmin http port
 * flyteadmin grpc port
 * flyteconsole grpc port

   This is necessary because the ingress may be configured in a way that
   it sends TLS traffic to internal Flyte services. Istio will use port
   names to determine traffic - and may therefore assume the appProtocol
   of http, even though traffic from ingress -> flyteadmin is actually
   https. This misconfiguration prevents any traffic from flowing
   through the ingress to the service.

   Flyteadmin http and grcp ports *are* accessible using `http` and
   `grpc` values for appProtocol respectively within the cluster, but as
   soon as traffic travels between the ingress and the service those settings
   will not work. The most "compatible" setting is `tcp` which works for
   any network stream.

 - Adds the nginx.ingress.kubernetes.io/service-upstream: "true"

   Nginx Controller using endpoints instead of Services kubernetes/ingress-nginx#257
   kubernetes/ingress-nginx@main/docs/user-guide/nginx-configuration/annotations.md#service-upstream

Signed-off-by: noahjax <[email protected]>
Signed-off-by: ddl-ebrown <[email protected]>
@ddl-ebrown ddl-ebrown force-pushed the noahjax.add-appProtocol-to-agent branch from 70dc772 to 8590993 Compare June 25, 2024 22:02
@pingsutw
Copy link
Member

Thank you @noahjax @ddl-ebrown

@pingsutw pingsutw merged commit 5c70453 into flyteorg:master Jun 25, 2024
55 checks passed
@ddl-ebrown ddl-ebrown deleted the noahjax.add-appProtocol-to-agent branch June 25, 2024 23:10
robert-ulbrich-mercedes-benz pushed a commit to robert-ulbrich-mercedes-benz/flyte that referenced this pull request Jul 2, 2024
vlibov pushed a commit to vlibov/flyte that referenced this pull request Aug 16, 2024
Signed-off-by: noahjax <[email protected]>
Signed-off-by: ddl-ebrown <[email protected]>
Signed-off-by: Vladyslav Libov <[email protected]>
@fg91
Copy link
Member

fg91 commented Sep 25, 2024

@noahjax I have a question about this change in charts/flyte-core/templates/admin/service.yaml:

    - name: grpc
      port: 81
      protocol: TCP
      # intentionally set to TCP instead of grpc
      appProtocol: TCP

Could you please explain why the appProtocol needs to be set to TCP for you instead of grpc?

When upgrading flyte-core from 1.12.0 (which does not have this change) to 1.13.0 my istio setup (described here) stops working. It does work again if I remove the app protocol again or set it to grpc instead of TCP. I've tried for istio version 1.17.2 as well as the newest 1.23.2.

Could you please share which istio version you use and how your virtual service for flyteadmin looks like? Thank you 🙏

@noahjax
Copy link
Contributor Author

noahjax commented Sep 25, 2024

@fg91 Apologies in advance for the unsatisfying answer. I submitted these changes as part of my work to get Flyte + Istio running at my previous employer, but I have since left the company and as a result I no longer have access to any of the specific configuration details.

Maybe @ddl-ebrown would be able to share some additional information here

@fg91
Copy link
Member

fg91 commented Oct 7, 2024

Thank you @noahjax 🙏

@ddl-ebrown could you please take a look at my questions here? 🙇

@ddl-ebrown
Copy link
Contributor

ddl-ebrown commented Oct 8, 2024

Hi @fg91 - sorry for the late reply. These are my notes regarding why tcp is necessary here

  • Adds the nginx.ingress.kubernetes.io/service-upstream: "true"

    Nginx Controller using endpoints instead of Services kubernetes/ingress-nginx#257
    https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/annotations.md#service-upstream

  • Adds appProtocol values of tcp on services:

    • flyteadmin http port
    • flyteadmin grpc port
    • flyteconsole grpc port

    This is necessary because the ingress may be configured in a way that
    it sends TLS traffic to internal Flyte services. Istio will use port
    names to determine traffic - and may therefore assume the appProtocol
    of http, even though traffic from ingress -> flyteadmin is actually
    https. This misconfiguration prevents any traffic from flowing
    through the ingress to the service.

    Flyteadmin http and grcp ports are accessible using http and
    grpc values for appProtocol respectively within the cluster (i.e.
    from a user workspace), but as soon as traffic travels between the
    ingress and the service those settings will not work. The most
    "compatible" setting is tcp which works for any network stream.

We're currently running on Istio 1.23.0, but this configuration also worked for us with Istio 1.21.2. I did try to get grpc working (as that seemed most sensible to me based on what the traffic should be), but in an environment with an ingress this simply doesn't work (one way to make it work is to separate internal service endpoints from external service endpoints -- but that's a bunch of additional work / complexity that was not preferred for our deployment).

tcp should be the most compatible appProtocol because it tells Istio to not try and intercept / observe the traffic -- https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/. I'm surprised it's not working in your deployment, but Istio is also a complex beast :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants