-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background sync for datasets from CommCare HQ #41
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the refactor commits @kaapstorm.
I tried reviewing this PR but it was a bit difficult with this PR attending to quite a lot of changes with some big commits.
Plus the part about OAuth that came along with syncing data sources is new to me.
Are you able to break this down further for easier review? Or else if its better the way it is, I will take a more dedicated pass. Please let me know
Tests are failing because they are not configured correctly. I'll fix that. @mkangia you asked:
I agree that Add OAuth 2.0 server is a monster. All the commits that went into it can be found here but I'm not sure that's any more helpful. I'll try to unpack them. I'll timebox it and see how I get on. (Rearranging and squashing the commits from the draft PR took hours. Hopefully unpacking might be quicker.) |
Thanks @kaapstorm Good to timebox or else then If its easier, we could jump on a call where we can walk through the big commits |
d8dc390
to
f9ea93b
Compare
Accepts changes to datasets from CommCare HQ data forwarding | ||
""" | ||
|
||
MAX_REQUEST_LENGTH = 10 * 1024 * 1024 # reject JSON requests > 10MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaapstorm How was this limit determined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A wild guess as to how big a "very big" request might be? I'm open to better ways to determine this value.
|
||
@expose('/change/', methods=('POST',)) | ||
@handle_api_exception | ||
def post_dataset_change(self) -> FlaskResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaapstorm Quickly remind me again, is this endpoint going to be hit for every row change in a data source? (it seems like it considering that the DataSetChange
has a doc_id
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint will be hit for every change for a doc_id, where a doc_id originally refers to a case or a form. Sometimes one doc_id can result in multiple rows, e.g. a data source definition that pulls out a question inside a repeat group.
@@ -193,3 +204,52 @@ def array_is_falsy(array_values): | |||
return [] | |||
|
|||
return array_values | |||
|
|||
|
|||
def js_to_py_datetime(jsdt, preserve_tz=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is amazing!
cast_functions = { | ||
# 'BIGINT': int, | ||
# 'TEXT': str, | ||
'TIMESTAMP': partial(js_to_py_datetime, preserve_tz=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned: partial functions. (nice!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
hq_superset/models.py
Outdated
@@ -32,3 +45,71 @@ def update_dataset(self): | |||
rows = list(cast_data_for_table(self.data, table)) | |||
insert_stmt = table.insert().values(rows) | |||
connection.execute(insert_stmt) | |||
|
|||
|
|||
class HQClient(db.Model, OAuth2ClientMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to be nitpicking here...
Can we maybe call this something like HQOAuthClient
? The reason is that the current naming made me think this was the request client (don't know if I'm just super pedantic).
Anyway, I'll leave it to your own discretion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I rebased (before reading this comment) I renamed this class "OAuth2Client", in line with its mixin. I did the same with "Token".
"HQOAuthClient" may have been a better name, considering it's got a "domain" field, but I feel like "OAuth2Client" is close enough, and better than "CommCareHQOAuth2Client".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like "OAuth2Client" is close enough
Perfect!
|
||
### Setup env | ||
The 'User configurable reports UI' feature flag must be enabled for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more of an HQ discussion, but should we not have a different FF for this workflow so users don't have to be concerned about the User Configurable Reports UI
feature flag until they actually want to user the UCRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can definitely pull CommCare Analytics into the current discussion around UCR feature flags.
Absolutely great work, @kaapstorm ! Just a couple of comments, nothing big. |
459573e
to
6b78839
Compare
There is a lot happening in this commit. 1. Setting up the test database for datasets from HQ no longer needs a special function. The `tests.utils` module is renamed `tests.const` and the `UnitTestingOnly` exception is no longer needed. 2. The `oauth2_server` module is the heart of the implementation of an OAuth 2.0 server, based heavily on the sample implementation given at https://github.com/authlib/example-oauth2-server/ 3. The OAuth 2.0 server is exposed in `api.py`, and uses models defined in `models.py`. 4. The OAuth 2.0 server uses Fernet symmetric encryption so that OAuth 2.0 client secrets can be read for setting up syncing for multiple data sources. It uses a couple of utility functions defined in `utils.py`.
Resolved offline. |
New commits look good. Will wait for responses on other comments from the last review. |
New commits look good 👍 |
Do you think there is anything still left to do @Charl1996 @mkangia ? |
Hey @kaapstorm I see some open comments from my earlier review. Waiting on those to do a full pass again. I hope these will be quick since they are mostly minor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a brief review of the newer commits and all seems fine. I'm happy with this PR.
@@ -39,6 +40,10 @@ def save_token(token, request): | |||
|
|||
|
|||
def config_oauth2(app): | |||
authlib_logger = logging.getLogger('authlib') | |||
authlib_logger.addHandler(logging.StreamHandler(sys.stdout)) | |||
authlib_logger.setLevel(logging.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗒️
49d575a fixed the problem with syncing from CommCare Staging. I don't like it, but the right way to do it didn't work. "If at first (or second, or third, or fourth) you don't succeed, use a bigger hammer." |
I think that's everything. If there is anything I've missed, please link to it. |
# 2.0 secure transport check despite transport being over HTTPS. I | ||
# hate to do this, but werkzeug.contrib.fixers.ProxyFix didn't fix | ||
# it. So I've run out of better options. (Norman 2024-03-13) | ||
os.environ['AUTHLIB_INSECURE_TRANSPORT'] = '1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kaapstorm
Good find on the proxy redirection.
I can make peace with this change to unblock QA/UAT.
It's a positive that this is on analytics server that we have control/access to and not superset.
I was trying to find where in nginx conf, this is getting set and compare that with what we have for HQ though HQ nginx template has a lot happening.
Couple of things I noticed that could be faulty
- we have two "server" blocks with the same server name in nginx conf. I am not sure if that is expected/correct
- I don't know what the following block does
if (!-f $request_filename) {
proxy_pass http://superset;
break;
}
@sravfeyn assuming you have more insight when this was setup.
Can you see something faulty with the current nginx conf or point out what could redirecting urls to http?
I don't know what this block does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look at the nginx config, and realised it will take me a bit longer to change and test it. I'll merge this PR, and I'll create a follow up ticket to see how we can configure nginx so that we can use werkzeug's "ProxyFix" class (or not have to use it at all) instead of using this environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay @kaapstorm, thanks for taking a look.
I am fine with this merge till this stays on staging and is not released to production environment until the secure fix is done.
Jira: SC-3212
This pull request includes
Easiest to review 🐡 commit 🐙 by 🐠 commit 🐟
This code has been tested locally.
It will be deployed to Staging on Wed Feb 28.
QA is scheduled.