-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: send course/block tags with the CourseOverview and XBlock sinks #41
Conversation
Tags are stored on the course_block_json/block_data_json, serialized as a list of "tag name=tag value" strings. If the course/block has no tags, its json "tags" value will be null.
Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
Awesome, I hope to have time to test this tomorrow. From the examples it looks like hierarchical taxonomies don't attach each parent tag in the hierarchy, just the leaf node. Is that correct? I imagine people will want to be able to "drill up" as it were, which we wouldn't be able to do on the reporting side without having all of the taxonomies imported. |
@bmtcril No worries -- I can ensure that the "implicit" parent tags are also exported here. I'd also appreciate your feedback on how the tags are being stored in Clickhouse.. is that flat list of "taxonomy name=tag value" OK? It's repetitive, in that the "taxonomy name" is repeated for each tag, but I figured it would be easier to query on than some nested structure. |
Unfortunately the tagging API doesn't have a single-query mechanism for returning all explicit and implicit tags for all blocks in a course, so we need to query once per block.
@bmtcril CC @Ian2012 This is ready for review now -- I've tested it end to end in my tutor dev stack, and it's working now.
Open question :) |
@pomegranited There is no need to do a migration as all MVs are managed in DBT, however that shouldn't be a concern here |
platform_plugin_aspects/utils.py
Outdated
def _get_object_tags(usage_key): # pragma: no cover | ||
""" | ||
Wrap the Open edX tagging API method get_object_tags. | ||
""" | ||
# pylint: disable=import-outside-toplevel,import-error | ||
from openedx.core.djangoapps.content_tagging.api import get_object_tags | ||
|
||
return get_object_tags(object_id=str(usage_key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should assume this could fail in case of changes to the location of the function or in the case it doesn't exist (older versions than Redwood)
The function get_model
from the utils.py
module is a good pattern to follow on those cases, and we should assume an import can always be None
and handle those cases. Is the operators and developers function to configure it according to their custom developments and backports.
We can rename it to something more useful.
I was able to test this locally, thanks for the excellent directions! I think that having a dictionary for the output makes a little more sense. Aside from the storage I'm already worried that someone will name a tag with an "=" in it or something someday. We're already parsing out the JSON from there, a little more hopefully won't hurt. I also wrapped the platform import in a try, since this may be used in earlier versions than Redwood. We haven't tested that in a while, however, and probably should. |
I also agree that we don't need a migration at this point, since we're just capturing the data for v1. We can add dbt models for it when we know more about how we want to query these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM in the current state, but some of the changes are mine so I'll wait for a thumb or more feedback from @Ian2012 as I think I addressed his comments with my changes?
@bmtcril Yes, you addressed my concerns here. Just one more comment (not a blocker), does it makes sense to have the nested structured saved? Instead of {
"tags": {
"x": [
"y", "z"
]
}
} use: {
"tags": {
"x": [
{ "y": {"z": null}}
]
}
} It would be way harder to query this, just wanted to make sure we are not losing any context here. |
@Ian2012 that's a good point, but I think we probably don't need that complexity here. When it comes time to start reporting on these we'll need to make a decision about whether to import the whole set of tags to Aspects as well, and the hierarchy can come with it at that point. Otherwise I think this is sufficient to support the filtering cases we've talked about so far. 🤞 |
@pomegranited 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
Tags are stored on the course_block_json/block_data_json, serialized as a list of "tag name=tag value" strings.
If the course/block has no tags, its json "tags" value in the json will be null.
Closes openedx/openedx-aspects#217
Testing instructions
Setup:
Install this branch in your tutor dev/local environment, e.g
Enable tagging and use of the Course Authoring MFE by enabling these waffle flags for "everyone":
Set up some sample taxonomies, orgs, and courses.
E.g. I have mounted edx-platform, and so can do this:
Update generate.py to provide a valid USER_EMAIL and TAXONOMY_SAMPLE_PATH, e.g
Run the script in the CMS shell:
Testing:
Navigate to the Course Authoring MFE and locate a tagged course, e.g.
http://apps.local.edly.io:2001/course-authoring/course/course-v1:SampleTaxonomyOrg1+STC1+2023_1
Navigate to a unit in this course, and click Manage Tags in the right-hand sidebar to Edit Tags and add tags to one or more units.
Publish this course.
Visit Superset as a superuser.
Check that the course_overviews sink contains tag data in its
course_data_json
, e.g.Check that the course_blocks sink contains tag data in its
block_data_json
, e.g.Author Notes & Concerns
There's a bug in the Course Authoring MFE that prevents the Manage Tags UI from saving tags for units (step 2 under Testing). Branch that fixes this: TBD.fixed by Bump openedx-learning to support tagging with multiple taxonomies at once [FC-0036] edx-platform#34490.The Manage Tags UI shows a tag count that includes explicit + implicit tags. When tags from a hierarchical taxonomy are serialized for the course/block, only the explicit "leaf" tags are stored, so the counts will differ.6055922 ensures both implicit and explicit tags are serialized to ClickHouse.tags
field to the materialized views like we do for other block_data_json fields?Merge checklist:
Check off if complete or not applicable: