Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superset: Fetch superuser/staff status from OAuth #42

Closed
pomegranited opened this issue Mar 8, 2023 · 21 comments · Fixed by openedx-unsupported/tutor-contrib-superset#44
Assignees
Labels
feature 🌟 new addition of functionality good first issue A good task for a newcomer to start with help wanted Ready to be picked up by anyone in the community

Comments

@pomegranited
Copy link
Contributor

pomegranited commented Mar 8, 2023

Currently, the Open edX SSO Security Manager installed by https://github.com/openedx/tutor-contrib-superset queries the Open edX MySQL database to determine if a user is a superuser or global staff, in order to determine their roles and permissions in Superset.

This information should be gleaned from OAuth data instead.
e.g. the openedx:auth_backends/backends.py decodes this information from the JWT access_token. Superset has only an OAuth2 bearer token, but we should be able to convert this to JWT (ref).

cf openedx-unsupported/tutor-contrib-superset#6 (comment)

@pomegranited pomegranited converted this from a draft issue Mar 8, 2023
@pomegranited pomegranited added the feature 🌟 new addition of functionality label Mar 8, 2023
@pomegranited
Copy link
Contributor Author

FYI @bmtcril

@bmtcril
Copy link
Contributor

bmtcril commented Apr 12, 2023

We may be able to fix this one in the same changes as well: openedx-unsupported/tutor-contrib-superset#11

@bmtcril bmtcril moved this from Backlog to Ready for Work in Data Working Group Apr 20, 2023
@Ian2012
Copy link
Contributor

Ian2012 commented Apr 24, 2023

Is there a defined starting point for this issue?

@pomegranited
Copy link
Contributor Author

Hi @Ian2012 thanks for taking on this task!

I don't enough about OAuth2 and JWT to advise you specifically here, but maybe these pointers will help?

  • Superset uses the Flask AppBuilder framework, and we're currently using authlib for SSO. Perhaps the docs for FAB or authlib will help describe how to make our custom auth manager fetch JWT tokens? Or maybe there are other FAB projects that use JWT tokens during authentication?
  • According to these docs, you can hit the /oauth2/access_token/ endpoint (again) with the access token received during SSO, and request a JWT access token. Those docs are old, but seem to be referencing code that was moved to edx-platform/openedx/core/djangoapps/auth_exchange. But I'm not sure if our SSO use case fits the purpose of those auth views?

@Ian2012
Copy link
Contributor

Ian2012 commented Apr 25, 2023

@Ian2012
Copy link
Contributor

Ian2012 commented Apr 28, 2023

@pomegranited Seems like to get a JWT token, we need to authenticate with google or facebook, or send the username and password:

image

@pomegranited
Copy link
Contributor Author

@Ian2012 I'm hoping that comment is out of date..

Have you tried using the superset-sso DOT application created during Superset init?

http://localhost:8000/oauth2/exchange_access_token/superset-sso

Or even just oauth2, since it looks like the view might just use the client_id to determine the DOT Application backend which is passed in as POST data?

http://localhost:8000/oauth2/exchange_access_token/oauth2

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented May 2, 2023

Hi there @pomegranited: I ran some tests following your suggestions, thank you so much! Here are our findings:

We first tried oauth2/exchange_access_token/superset-sso as you suggested, but we got the following:

After getting the missing backend error, we tried your other suggestion: oauth2/exchange_access_token/oauth2, but we got the same result. These attempts led us to list the backends being used:

This list has just authentication backends which we suppose are used to generate the new token, so my theory is that we need a working backend to be able to generate the token and that's why it's not working. I believe just a few backends like google and facebook are supported, as the docs say.

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented May 2, 2023

We tried other stuff, like getting the JWT cookie set during the LMS login:
image
And it worked: openedx-unsupported/tutor-contrib-superset#28. But it has its own limitations, the main one being that for it to work, superset needs to be hosted in a subdomain of the LMS so they can share these types of cookies. 🤔

@pomegranited
Copy link
Contributor Author

@mariajgrimaldi Hmm.. that's probably an ok workaround if need be.. What do you think, @bmtcril ?

I dug around and didn't find anything specifically useful to this cause, but I found something kind of related: With python social_auth, there's also a user_details() method in the auth pipeline which gets sent a bunch of extra user info. The details of what is in the userinfo field depends on the SSO provider, and the scope requested from (and granted by) that provider. I'm not sure what user data might be provided by Open edX SSO, but it might include the superuser/staff flags?

Flask-AppBuilder shows an example of its equivalent using OAuth: https://github.com/dpgaspar/Flask-AppBuilder/blob/master/docs/security.rst#authentication-oauth

It shows example configs for various 3rd-party oauth providers , with an example of overwriting the @appbuilder.sm.oauth_user_info_getter method on the security manager class and fetching various extra user info fields like names and emails. We're currently configuring the oauth app with only the user_id scope -- but I think there's profile and email scopes available too? I think you also need to request these same scopes during authentication, so would need to change the OAUTH_PROVIDERS setting to something like "profile,email,user_id". At the very least, this could replace our hit to the user_profile URL made during authentication.

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented May 4, 2023

@pomegranited: this is kind of a brain dump. Here we go:

About JWT tokens and scopes

This is how I understand the current mechanism works (roughly), let me know if I got something wrong:

  • Superset uses the LMS as IdP with Oauth2 implemented by Django Oauth Toolkit
  • I try to login into superset
  • At some point, Superset redirects to /oauth2/authorize using the client ID previously configured
  • In the LMS, we search for the application with such client ID
  • After finding it, DOT redirects to the redirect uri hosted by the Superset application: http://superset.local.overhang.io/oauth-authorized/openedxsso
  • That authorization view generates a Bearer token
  • After successfully generating the token, tries getting the users information and so on.

Now, if I'm right, the token generated doesn't contain information about the user. So, we could configure other scopes and still our token would have the same amount of data, right? Now, If we were using JWT instead, we could modify the scopes getting that way more info about the user associated with the token, as explained here. This mechanism reminds me a bit of the Studio SSO, which roughly works as follows:

  • Studio uses the LMS as IdP with Oauth2 implemented by Django Oauth Toolkit
  • I try to login into Studio
  • At some point, Studio redirects to the LMS path /oauth2/authorize using the client ID previously configured
  • In the LMS, we search for the application with such client ID
  • After finding it, DOT redirects to the redirect uri hosted by the Studio service: http://studio.local.overhang.io/complete/edx-oauth2/
  • That authorization view generates a JWT token
  • After successfully generating the token, tries getting the users information from the JWT token previously generated.

Studio uses the auth-backend library to implement the callback we specified in the redirect URI field. Calling complete/edx-oauth2 triggers the authentication pipeline on the Studio side, which consists, among other things, of generating a JWT access token that is used afterward to get the user details: https://github.com/openedx/auth-backends/blob/master/auth_backends/backends.py#L100-L119. This is just what we need! But there's a catch, to generate a JWT token in the LMS we need the authorization code returned by our IdP when we called /oauth2/authorize, and while the authorization view from the edx-oauth2 backend gets the authorization code, our superset app doesn't:

Authorization view from the edx-oauth2 backend based on Social Core:

https://github.com/python-social-auth/social-core/blob/master/social_core/backends/oauth.py#L359 -- here, self.data contains the query params sent by the LMS which contain the code parameter.

Authorization view from flask app builder:

https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/views.py#L650 -- here, the code parameter seems to get ignored entirely

So we can't call the LMS and generate a JWT like Studio does without that code. I'm not sure if we could do a workaround to get the authorization code to generate a JWT token or even generate it using other configurations. I'll be testing some more this token generation 🤔

About flask app builder auth

The pattern I've found in some examples for getting the user's information is calling an API hosted by the IdP, like when they do self.appbuilder.sm.oauth_remotes[provider].get("userinfo") and it means making an API call to the LMS getting: "GET /userinfo HTTP/1.1" 404. I don't think calling an API to get that kind of info (superuser, admin) would be a good idea security-wise 🤔 . What do you think? Also, that pattern makes me think that calling the profile API is not a bad thing at all!

@pomegranited
Copy link
Contributor Author

Hi @mariajgrimaldi , thank you for summarising what you've learned about OAuth and JWT, that really helps me understand!

I've CC'd you on this thread with @robrap on Slack -- he's suggesting there's a way to request the token directly as JWT, bypassing Bearer tokens entirely.

Also, that pattern makes me think that calling the profile API is not a bad thing at all!

Lol yeah, agreed. Please disregard my suggestion about changing how we get this info :)

@robrap
Copy link

robrap commented May 10, 2023

@pomegranited: I responded on Slack, but if it makes sense to move the conversation to this PR, please do that. Thanks.

UPDATE: I didn't see the above comment yet about JWTs, and am reading it now.

@robrap
Copy link

robrap commented May 10, 2023

@mariajgrimaldi: I wonder if you need the equivalent of https://github.com/openedx/edx-platform/blob/f0a9ef21613a780e4c7975de47375fe7f21e3710/openedx/core/djangoapps/oauth_dispatch/views.py#L115C1-L116 in the AuthorizationView? The change I linked to was added in the somewhat recent past as Mobile transitioned from Bearer tokens to JWTs.

@mariajgrimaldi
Copy link
Member

@robrap: I'll research a bit more of that! I'll let you know

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented May 17, 2023

I tried to override the AuthorizationView from the superset application, specifically the method oauth_authorized which exposes /oauth-authorized/provider. The idea of modifying the method was to change the token type from Bearer to JWT, this is how I did it:

class OpenedxSSOView(AuthOAuthView):

    @expose("/oauth-authorized/<provider>")
    def oauth_authorized(self, provider: str) -> WerkzeugResponse:
        if provider not in self.appbuilder.sm.oauth_remotes:
            return redirect(self.appbuilder.get_url_for_login)
        try:
            resp = self.appbuilder.sm.oauth_remotes[provider].authorize_access_token(token_type="jwt")
            ...

class OpenEdxSsoSecurityManager(SupersetSecurityManager):

    authoauthview = OpenedxSSOView

    def set_oauth_session(self, provider, oauth_response):
    ...

It actually worked! We got the JWT during the user authentication. But we hit a wall afterward:

Authlib oauth2 backend only supports Bearer tokens: https://github.com/lepture/authlib/blob/master/authlib/integrations/requests_client/oauth2_session.py#L16-L17

I posted a question on Stack overflow explaining what I was trying to do: https://stackoverflow.com/questions/76274928/unsupported-jwt-token-type-using-authlib. I was going to open an issue on the repo, but they suggest to ask questions on there.

@robrap
Copy link

robrap commented May 17, 2023

If this is interacting with our APIs, at this time, we require the authorization header to use JWT in place of Bearer. Not sure if this is configurable? https://github.com/lepture/authlib/blob/70a6bfa92aa1007515465a7af1a60a4518abe699/authlib/oauth2/auth.py#L80-L83

@mariajgrimaldi
Copy link
Member

mariajgrimaldi commented May 18, 2023

Well, we kind of did it: openedx-unsupported/tutor-contrib-superset#40. I followed Robert's suggestion to get there:

This is currently looking incredibly hacky, but it's working:

I'm gonna do some other stuff and take a look later with fresher eyes.

@robrap
Copy link

robrap commented May 18, 2023

@mariajgrimaldi: Congrats on getting it working. If we used "Bearer" in the Authorization header, rather than "JWT", but included the JWT as the token, would that be the non-hacky version? I think that would all be part of the BearerAuthentication DEPR: openedx/edx-drf-extensions#284.

FYI: @feanil: Since you are looking into authentication clean-up work and issues related to current state.

@Ian2012
Copy link
Contributor

Ian2012 commented May 29, 2023

There was no need to modify the OAuthView class to accept the JWT token type.

Here is a new implementation of this working: openedx-unsupported/tutor-contrib-superset#44 with some refactoring and fixes

@Ian2012
Copy link
Contributor

Ian2012 commented May 29, 2023

assign me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🌟 new addition of functionality good first issue A good task for a newcomer to start with help wanted Ready to be picked up by anyone in the community
Projects
Development

Successfully merging a pull request may close this issue.

5 participants