Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure DevOps OIDC #550

Closed
wants to merge 4 commits into from
Closed

Conversation

davidcorrigan714
Copy link

@davidcorrigan714 davidcorrigan714 commented Apr 26, 2024

Description

This change allows AWS connections from Azure DevOps to use OIDC authentication to AWS instead of stored access tokens. Microsoft also calls this "Workload Identity Federation".

Motivation

Using long lived credentials for authenticating into AWS is highly discouraged and incurs the manual overhead of managing those credentials. This process uses short lived OIDC tokens generated by Azure DevOps which are generated for each run and authenticated by AWS and a configured OIDC IdP to provide temporary credentials for a role.

Related Issue(s), If Filed

#521

Testing

I've been testing this during in an Azure DevOps Services account, the change is not applicable to the Azure DevOps Server product but I have confirmed that it does not break plugin installation for it. I tested primarily against the AWSPowerShellModuleScript and the AWSCLI task, some more testing is probably warranted though the rest of the tasks seem to leverage the authentication code that I updated.

Checklist

  • I have read the README document
  • I have read the CONTRIBUTING document
  • My code follows the code style of this project
  • I have added tests to cover my changes
  • A short description of the change has been added to the changelog using the script npm run newChange

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@davidcorrigan714 davidcorrigan714 requested a review from a team as a code owner April 26, 2024 18:55
@davidcorrigan714
Copy link
Author

Need to do the npm run newChange command still, somehow missed that the first time I read the README but might as well get the review going and collecting any comments.

@HenrikStanley
Copy link

Need to do the npm run newChange command still, somehow missed that the first time I read the README but might as well get the review going and collecting any comments.

It seems like a lot of PRs on this repo does not get a lot of attention with the oldest hanging back from 2020.
I am going to try and go through our companies Enterprise Agreement and get our Technical Account Manager to see if they can help push some priority on this.

I have done a review of your code and docs, and I think you have done a stellar job @davidcorrigan714
In a test on my ADO test tenant it also worked as expected.

@rbbarad
Copy link
Contributor

rbbarad commented Jul 18, 2024

Thanks for you contribution @davidcorrigan714 , and thank you all for your patience with this issue. I'm currently working with @ROunofF to land PR #553, which will provide OIDC support for all of the Toolkit's tasks.

@davidcorrigan714
Copy link
Author

davidcorrigan714 commented Jul 18, 2024 via email

@ROunofF
Copy link
Contributor

ROunofF commented Jul 19, 2024

I never saw you could reference another service connection in the way you did...

You're referencing an azure Service Connection in the aws Service Connection so the potential security concerns is the same.

We could probably merge both :

  • Have the STS credentials fetched in awsConnectionParameter.ts instead of a task
  • Use a modified AWS Service Connection which use the (OIDC) service connection.

@HenrikStanley
Copy link

HenrikStanley commented Jul 20, 2024

To chime in here, from an ergonomics point of view, the implementation that uses an AWS service connection with an OIDC option is heavily prefered from our organization.

We run some 800 AWS accounts and have more than 900 ADO projects.
Being able to just inplace change the service connections in pipelines would make migration to an OIDC setup so we can finally get rid of a lot of IAM users a whole lot more feasable.

To me it also feels more like the spirit of how users would expect this to work from a GUI and setup perspective.
Adding the following to literally thousands of pipelines would be a major barier to adopt OIDC.

steps:
    - task: AWSTemporaryCredentials@1
      displayName: 'Getting STS Credentials'
      inputs:
          azureSubscription: 'azuredo-poc'
          regionName: 'ca-central-1'
          assumeRole: arn:aws:iam:: 012345678901:role/azdo-s3-read

Just editing the Service Connections as an Admin on the users behalf is a lot more feasable as it can be done with admin changes only and no modifications to the pipelines themselves..

@ROunofF
Copy link
Contributor

ROunofF commented Jul 20, 2024

Thanks @HenrikStanley for the insights on this.

We run some 800 AWS accounts and have more than 900 ADO projects. Being able to just inplace change the service connections in pipelines would make migration to an OIDC setup so we can finally get rid of a lot of IAM users a whole lot more feasable.

Questions:

  1. How do you manage your service connections ?
  2. Are you sharing the same service connections across projects?
  3. Do you plan to use the same IAM role or you'll have 800 one of them ? (assuming one per account)

In both scenarios, you'll need:

1 - Setup the OIDC provider in all AWS accounts
2 - Create a role with the corresponding trust policy (of the OIDC provider in step 1) and attach the corresponding permissions (This may need to be replicated X time)
3 - Create an azure service connections so your worker have an OIDC token

Then David's way:
d1 - Change the AWS service connections to refer the azure Connection (3) and the role to assume (2). You have the assumeRole defined here so you can't reuse the service connections for multiple projects, so you potentially have 800 of those.

The other scenario #553 :
R1 - Add a task in the pipeline yaml with the corresponding assumeRole (2) and possibly re-using the azure Service Connection (3) since the assumeRole is defined per pipeline.

There is pros and cons for both...

@davidcorrigan714
Copy link
Author

davidcorrigan714 commented Jul 20, 2024

We don’t have nearly as many service connections, but we do it all through Terraform. Once this is in I’ll put up a PR for the ADO Terraform provider. We already use OIDC with HCP tokens a lot so we have modules already for setting up all the identity stuff in AWS.

Really nice thing with this is it won’t leave any keys in the state files.

@davidcorrigan714
Copy link
Author

Thanks @HenrikStanley for the insights on this.

We run some 800 AWS accounts and have more than 900 ADO projects. Being able to just inplace change the service connections in pipelines would make migration to an OIDC setup so we can finally get rid of a lot of IAM users a whole lot more feasable.

Questions:

  1. How do you manage your service connections ?
  2. Are you sharing the same service connections across projects?
  3. Do you plan to use the same IAM role or you'll have 800 one of them ? (assuming one per account)

In both scenarios, you'll need:

1 - Setup the OIDC provider in all AWS accounts 2 - Create a role with the corresponding trust policy (of the OIDC provider in step 1) and attach the corresponding permissions (This may need to be replicated X time) 3 - Create an azure service connections so your worker have an OIDC token

Then David's way: d1 - Change the AWS service connections to refer the azure Connection (3) and the role to assume (2). You have the assumeRole defined here so you can't reuse the service connections for multiple projects, so you potentially have 800 of those.

The other scenario #553 : R1 - Add a task in the pipeline yaml with the corresponding assumeRole (2) and possibly re-using the azure Service Connection (3) since the assumeRole is defined per pipeline.

There is pros and cons for both...

So most of that is wrong. In my method AWS is configured with the OIDC IdP, existing AWS service connections are switched to use OIDC, then roles are created in AWS with a trust policy allowing the service connections to assume them. Azure doesn’t come into play at all, other than the OIDC token having an Azure audience. The wonky audience is far from ideal and I’m a tad ticked at Microsoft for not finishing the feature as originally designed for better support for third party services but IMO the subject of a service connection is already tied to a specific audience just because service connections are naturally created one per service.

@HenrikStanley
Copy link

@ROunofF

  1. Service connections are currently managed by an admin team. When a dev team requests pipeline access to their AWS account, the admin team creates the IAM User in AWS and then creates a service connection in the teams ADO project. Using Davids method, we can simply create the provider trust and IAM role using the trust in the AWS account on the developers behalf, then update inplace the existing connection. This allows us to update all pipelines and AWS accounts one at a time, but the important thing is we can do so without the developers needing to change anything in their pipelines as with Davids method, they just reference the exact same service connection, what changes is the under the hood implementation of how the AWS task gets the creds. This is important for a smooth transition away from IAM Users as they currently are a huge pain and poses one of the biggest security risks in our entire setup.

  2. No, in Azure DevOps service connections are tied to the projects, however inside a project a service connection may be used by multiple pipelines, however it only ever gives access to one single account, allowing settings on the service connection itself to select which pipelines are allowed to use it.

  3. It is one IAM Role and one Provider Trust per AWS account. (multiple if that account for any reason needs to be deployed to from multiple projects.

So yes, we need to do work regardless, that is a given, but the important part is that the work can be done without needing preform any changes to the developers pipelines. The amount of pipelines that connect to AWS accounts ranges in the thousands, so even if it is a small snippet, changing that for each and every pipeline and coordinating these changes with the developers could take the time to implement from weeks to years.

Also I do have question about the #553 implementation.
Is it correctly understood that it uses an Azure Workload Identity type of Service Connection?
If that is the case, it would create a huge amount of confusion as for large customers like my organization, we are also on Azure with 700+ subscriptions as well. If we from a service connection point of view in the ADO projects start mixing these things it would not be great.

With Davids solution an AWS service connection clearly is of the type AWS and you configure it with the desired type (in this case OIDC instead of static IAM) but both options reside in the plugin.

Lastly, the whole point of service connections in the first place is that we do not need to directly expose the credentials in the pipelines as environment variables. the #553 implementation seems to require that all tasks now do this.

So all tasks that are configured to use the service connection now needs to be changed to now reference it directly but instead rely on the environment variables. These do not persist between server changes (which happens every time you do a new stage btw), making it quite messy as every single task that does things in AWS in every pipeline in the entire org would need to be updated.

Have we also tested that ALL tasks that currently rely on a service connection can just fall back to the exported credentials? It has been a long time since I did some of my pipelines, but I do remember years ago that some tasks would only work with the service connection.

All of these things makes my organization heavily lean towards the @davidcorrigan714 solution. Or at the very least, any solution that does not require auth to be separate from the selected service connection as this would require re-writes of all pipeliens to support, on top of the Role / Service Connection changes.

@HenrikStanley
Copy link

HenrikStanley commented Jul 22, 2024

The other scenario #553 : R1 - Add a task in the pipeline yaml with the corresponding assumeRole (2) and possibly re-using the azure Service Connection (3) since the assumeRole is defined per pipeline.

There is pros and cons for both...

There is a problem with this here which is that you break the inherent design philosophy of using service connections in Azure DevOps. While I will be the first to admit I think there are many flaws with Service Connections in general, a big selling point is that each tasks explicitly references which credentials you are using. And persistence between jobs / stages / tasks is no issue as tasks individually references their authentication.

In the #553 implementation, you flip this on its head so now all AWS related tasks would implicitly authenticate using whatever the environment variables are at that moment.

Not only does this require that you remove the reference on all AWS tasks in all pipelines so they no longer point to the Service Connection, but you also create challenges when you want pipelines to run in multiple AWS accounts, or into the same account but with different dedicated roles. In this case I would need to run the sts assume setup every time I want to use a different credential set. And if I mess up and forget to explicit change my implicit authentication, I could easily end up executing commands in the wrong context.

In my opinion, relying on implicit anything in regards to declarative pipelines is a very dangerous proposition.
So while we can agree both have pros and cons, I do not think it is apples to apples comparisons here.

One implementation relies on the existing design language in terms of authentication and is (IAM Roles and Provider Trusts not withstanding) a drop in replacement. The other requires that you fundamentally re-write all your pipelines to take this new design philosophy into account.

At least to us, these are very important factors that should be considered when choosing an implementation.

@ROunofF
Copy link
Contributor

ROunofF commented Jul 23, 2024

@davidcorrigan714 @HenrikStanley Following your comments on this thread, we add a simple options to enable OIDC in the service connections and made the tasks use this to get the right credentials see #558 on the implementation.

This is simpler and seems to address the comments you left.

The powershell tasks seems to use a different mechanism, which I'm checking right now but let us know what you think.

This will make #553 not useful (and will superseded #550 as well)

@rbbarad
Copy link
Contributor

rbbarad commented Aug 2, 2024

Thank you all for the contributions and helpful feedback in this PR. We've merged #558 to support OIDC and released this feature in v1.15.0 of the Toolkit. Closing this PR.

@rbbarad rbbarad closed this Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants