Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Feature]: Airflow Cognito integration #126

Open
LucaCinquini opened this issue Jun 27, 2024 · 13 comments
Open

[New Feature]: Airflow Cognito integration #126

LucaCinquini opened this issue Jun 27, 2024 · 13 comments
Assignees
Labels

Comments

@LucaCinquini
Copy link
Collaborator

No description provided.

@LucaCinquini LucaCinquini converted this from a draft issue Jun 27, 2024
@LucaCinquini
Copy link
Collaborator Author

LucaCinquini commented Jun 27, 2024

Dependency: CS setup Cognito user pool in each target shared services venue, also provide connection information and whatever instuctions are needed for integration.

Risks: Cannot be accomplished with current Airflow version, will need to wait for next Airflow version and support for AuthManager in Airflow 3.0.X????

Tests:
o Successful login into Airflow UI with Cognito credentials
o Successful use of Airflow API to submit a job with Cognito credentials
o Successful use of OGC API to submit a job with Cognito credentials

@LucaCinquini LucaCinquini moved this from Todo to In Progress in Unity Project Board Jul 29, 2024
@nikki-t
Copy link
Collaborator

nikki-t commented Jul 30, 2024

Cognito/Airflow Information

The Airflow web UI uses Flask App Builder (FAB).

Authentication for the API is handled separately to the Web Authentication.

An Amazon Cognito user pool is an OpenID Connect (OIDC) identity provider (IdP). Documentation on implementation options:

  • This documentation goes into a lot of detail around implementing OIDC in Airflow via FAB.
  • This AWS blog post details how to set up ALB with Amazon Cognito to authenticate users to Kubernetes web app. We may need this to facilitate Cognito access via the ALB to Airflow as it shows how the access token, subject, and user claims (JWT format) are passed to a web application.
  • In addition to the modifications made to existing FAB webserver_config.py file, we can create our own auth manager by subclassing BaseAuthManager.

Proposed architecture

  • Users are defined with groups that map to Airflow roles in the Cognito user pool.
    • Airflow roles: Admin, User, Op, Viewer, and Public
  • Modify webserver_config.py to create a subclass of FabAirflowSecurityManagerOverride and override the get_oauth_user_info method to authenticate with the Cognito user pool users and groups. Map Cognito user pool groups to Airflow roles and return the username and role keys (groups).
  • Include the new webserver_config.py file in the helm chart.

Info needed from Cognito

@nikki-t
Copy link
Collaborator

nikki-t commented Aug 12, 2024

Here are the general steps that are required for OAuth2.0 authentication with Cognito user pool. From: https://aws.amazon.com/blogs/security/how-to-use-oauth-2-0-in-amazon-cognito-learn-about-the-different-oauth-2-0-grants/

  1. HTTP GET request to https://AUTH_DOMAIN/oauth2/authorize where AUTH_DOMAIN=user pool's configured domain.
    a. response_type=code
    b. client_id
    c. redirect_uri=The URL that a user is directed to after successful authentication
    d. state=Random value that is used to prevent CSRF
    e. scope=Space-separated list of scopes to request for the generated tokens
    f. nonce=A random value that you can add to the request which is included in the ID token that Cognito issues.
  2. A CSRF token is returned in a cookie. The user is redirected to https://AUTH_DOMAIN/login (which hosts the auto-generated UI) with the same query parameters set from step 1.
  3. The user authenticates with the auto-generated UI.
  4. Cognito verifies the user pool credentials, the user is redirected to the URL that was specified in the origin redirect_uri query parameter. Also sets a code query parameter that specifies the authorization code vended to the user by Cognito.
  5. The application (Airflow webserver) extract authorization tool from query parameters and exchange it for user pool tokens. Exchange is a POST request to https://AUTH_DOMAIN/oauth2/token with application/x-www-form-urlencoded parameters: grant_type, code, client_id, redirect_uri.
  6. JSON response returned includes: access_token, refresh_token, id_token, expires_in, token_type

So far it looks like the traffic is passing steps 1 though 3 but the redirect may not be working on step 4. I can't quite isolate where in the Airflow webserver_config.py or the authentication flow in Cognito the issue is arising.

@nikki-t
Copy link
Collaborator

nikki-t commented Sep 17, 2024

Solutions tried,

  1. FAB documentation for OAuth, gets hung up on redirect URI and does not seem to reach get_oauth_user_info function.
  2. Stack Overflow which matches the GitHub OAuth configuration gets stuck on redirect URI.
  3. Airflow documentation
    a. Set up GitHub authentication and also stuck at redirect URI.
  4. ODIC provider does not work with current Airflow version. Is creating a new OIDCView from existing OIDView so extending functionality.
  5. CognitoAuthManager extending BaseAuthManager class to define our own authentication operations.
    a. Requires quite a bit of work to build out the class and provide authentication and authorization.
    b. Needs scoping.

Documentation on OAuth 2.0 grants in Cognito: https://aws.amazon.com/blogs/security/how-to-use-oauth-2-0-in-amazon-cognito-learn-about-the-different-oauth-2-0-grants/

It looks like Airflow may be moving away from FAB in the future and it may make the most sense to implement our own auth manager following the AWS auth manager architecture (Note: this does not use cognito for authentication and authorization).

@LucaCinquini LucaCinquini added U-SPS and removed U-SPS labels Sep 25, 2024
@LucaCinquini LucaCinquini changed the title Airflow Cognito integration [New Feature]: Airflow Cognito integration Sep 25, 2024
@LucaCinquini LucaCinquini self-assigned this Sep 26, 2024
@nikki-t
Copy link
Collaborator

nikki-t commented Oct 1, 2024

We made some progress by exploring the Flask AppBuilder and authlib library classes. The error seems to occur with the POST request to the Cognito token issuer endpoint. The POST request seems to be formed correctly but the Airflow web server hangs when making the request and does not return any error messages.

Solutions tried,

  • Modifying the authlib.oauth2.client.OAuth2Client._fetch_token method to use the JupyterHub authentication. This returns a 400 Bad Request response when returning the response object but does seem to retrieve the token data.
  • Using the Python requests library to sent a POST request. This hangs and does not return any error message.
  • Using the Python requests_oauthlib library's OAuth2Session class to formulate and send a request to fetch the token. This hangs and does not return any error message.

It seems like the Flask AppBuilder may be interfering with the request somehow. I am not sure if it has to do with async operations and event loops or if I am missing some other aspect of the web server.

@LucaCinquini LucaCinquini removed their assignment Oct 6, 2024
@nikki-t
Copy link
Collaborator

nikki-t commented Oct 7, 2024

A tentative solution can be found here: https://github.com/unity-sds/unity-sps/blob/126-airflow-cognito/airflow/config/webserver_config.py

  • Airflow (Flask AppBuilder) uses the Python authlib library for OAuth authentication. See docs and this library failed in two key areas which required overriding. Both failures occurred when making a request using the requests library.
  • Needed to override authlib.oauth2.client.OAuth2Client._fetch_token method to return Cognito token data.
  • Needed to override authlib.integrations.base_client.sync_openid.OpenIDMixin.fetch_jwk_set to return the public JSON Web Token data.

Considerations

  • The current solution allows users to authenticate from the Cognito user pool and assigns them the Admin role. Is it okay for all users to be Admins? Otherwise more investigating needs to be done to get the Airflow webserver to register users from Cognito and assign them the correct roles.
  • The current solution does not pull in the user's first or last name but that does not prevent a user record from being created and allowing users to log in.

@LucaCinquini
Copy link
Collaborator Author

Next Nikki is going to try to merge Brad's changes to make the full stack Proxies-Cognito-Airflow work together. This might imbolve removing the SSL certificates on the SPS ALBs if the proxies stop working.

@nikki-t
Copy link
Collaborator

nikki-t commented Oct 28, 2024

@jpl-btlunsfo and I were able to route the shared services proxy for unity-dev to my unity-nikki-1 deployment in unity-venue-dev so that the deployment has an HTTPS URL.

I added a callback URL to our Airflow app client: https://www.dev.mdps.mcp.nasa.gov:4443/unity-nikki-1/dev/sps/oauth-authorized/Cognito

And tested it, after logging in with Cognito, the URL is routed to:

https://www.dev.mdps.mcp.nasa.gov:4443/unity-nikki-1/dev/sps/authorize
    ?response_type=code
    &client_id=xxxxx
    &redirect_uri=http%3A%2F%2Fwww.dev.mdps.mcp.nasa.gov%3A5000%2Foauth-authorized%2FCognito
    &scope=email+openid+profile
    &state=xxxxx
    &nonce=xxxxxx

I set the webserver log level to DEBUG and see the following in the logs:

x.x.x.x - - [28/Oct/2024:20:20:04 +0000] "GET /login/Cognito?next=http%3A//www.dev.mdps.mcp.nasa.gov%3A5000/home HTTP/1.1" 302 1063 "https://www.dev.mdps.mcp.nasa.gov:4443/unity-nikki-1/dev/sps/login/?next=http%3A%2F%2Fwww.dev.mdps.mcp.nasa.gov%3A5000%2Fhome" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36"
x.x.x.x - - [28/Oct/2024:20:20:05 +0000] "GET /authorize?response_type=code&client_id=xxxx&redirect_uri=http%3A%2F%2Fwww.dev.mdps.mcp.nasa.gov%3A5000%2Foauth-authorized%2FCognito&scope=email+openid+profile&state=xxxxx&nonce=xxxxx HTTP/1.1" 404 456 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36"

I can find two issues related to the callback URL: http://www.dev.mdps.mcdp.nasa.gov:5000/oauth-authorized/Cognito

  1. It is pointing to HTTP and not HTTPS.
  2. It is pointing to port 5000 and not port 443.

I am thinking that this is an issue with the Airflow webserver pointing to the wrong protocol and port. I will have to dig deeper to see if this can be changed in the webserver code.

@jpl-btlunsfo
Copy link
Collaborator

@nikki-t Could you merge in the changes from the 429-fix-proxy-request-ports branch? Those changes are there to help clue Airflow in to its actual running protocol/scheme & port- which might fix the issue you're seeing with the callback URL.

@nikki-t
Copy link
Collaborator

nikki-t commented Oct 29, 2024

@jpl-btlunsfo - Sorry about that, I had not pushed the branch I created up to GitHub. I did merge in the 429-fix-proxy-request-ports and am still running into the callback issue above.

@nikki-t
Copy link
Collaborator

nikki-t commented Oct 29, 2024

I completed a deeper dive into the various redirects that are happening and unfortunately could not get much further.

I read through this page on "Running Airflow behind a reverse proxy" and noticed that you can specify a base_url. I tried this using: https://www.dev.mdps.mcp.nasa.gov:4443/unity-nikki-1/dev/sps/. I am not sure that is the right URL but I wanted something to test to start out with. However I kept running into an issue with the Helm chart and could not configure this setting for testing. (The Helm release kept would enter a a modification state and never complete, perhaps due to invalid configuration.)

I also tried to investigate how the redirect URL was specified in the Airflow Flask AppBuilder code but kept running into that same Helm chart modification issue as above and could not push any changes to the webserver_config.py file to try to capture the redirect behavior.

I did capture the routes that are taken when trying to log into Cognito using the proxy URL:

  1. Navigate to: https://www.dev.mdps.mcp.nasa.gov:4443/unity-nikki-1/dev/sps/
  2. Redirects to:
    https://unitysds.auth.us-west-2.amazoncognito.com/login?
        response_type=code
        &scope=openid email profile
        &client_id=xxxx
        &state=xxxx
        &redirect_uri=https://www.dev.mdps.mcp.nasa.gov:4443/unity/dev/redirect-url
        &nonce=xxxx
    
  3. User logs into Cognito (Cognito UI).
  4. Redirects to:
    https://www.dev.mdps.mcp.nasa.gov:4443/unity-nikki-1/dev/sps/login/?
        next=http://www.dev.mdps.mcp.nasa.gov:5000/home
    
  5. User clicks on "Sign in with Cognito" (Airflow UI).
  6. Redirects to:
    https://www.dev.mdps.mcp.nasa.gov:4443/unity-nikki-1/dev/sps/authorize?
        response_type=code
        &client_id=xxxx
        &redirect_uri=http://www.dev.mdps.mcp.nasa.gov:5000/oauth-authorized/Cognito
        &scope=email+openid+profile
        &state=xxxx
        &nonce=xxxx
    

a. Returns "Airflow 404 Page cannot be found."

  • It looks like the redirect_uri in step 2 is correct, although I might expect it to redirect to /unity-nikki-1/dev/sps.
  • The redirect in step 4's next query parameter has the incorrect port and protocol so the redirect_uri in step 6 also has the incorrect port and protocol.

@jpl-btlunsfo - Do you think this has anything to do with how the proxy is configured? I am not quite sure how those redirects are pulling in the incorrect port and protocol.

@jpl-btlunsfo
Copy link
Collaborator

jpl-btlunsfo commented Oct 29, 2024

I agree, step 4's next doesn't have the right /unity-nikki-1/dev/sps/ pathing- but that should be rewritten by the proxy when the actual redirect response comes through.

in step 6, that URL looks extra funky

http://Fwww.dev.mdps.mcp.nasa.gov:5000/oauth-authorized/Cognito

Is that "F" a typo? if it's not, I wonder how that redirect_uri was constructed?

Regarding

noticed that you can specify a base_url

That's something you can specify in the helm configuration- however, up until now we hadn't been using that. Instead that pathing fix has been occurring in the venue-services proxy (specifically in the 015-sps-airflow-ui parameter). If it's necessary to add that base_url to get the cognito uri working, we'll need to adjust that ssm config (and that might need some testing).

@nikki-t
Copy link
Collaborator

nikki-t commented Oct 29, 2024

The step 6 URL does have a typo, the "F" shouldn't be there. I did try testing the base_url but couldn't get past the Terraform deployment and the helm chart.

Maybe I can try digging back into the webserver config and code to see why it might be able to pull the right hostname but not port or protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests

3 participants