Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: send course/block tags with the CourseOverview and XBlock sinks #41

Merged
merged 6 commits into from
May 6, 2024

Conversation

pomegranited
Copy link
Contributor

@pomegranited pomegranited commented May 1, 2024

Tags are stored on the course_block_json/block_data_json, serialized as a list of "tag name=tag value" strings.

If the course/block has no tags, its json "tags" value in the json will be null.

Closes openedx/openedx-aspects#217

Testing instructions

Setup:

  1. Install this branch in your tutor dev/local environment, e.g

    tutor config save -a OPENEDX_EXTRA_PIP_REQUIREMENTS="git+https://github.com/openedx/platform-plugin-aspects.git@jill/sync-tags-on-course-sink#egg=platform_plugin_aspects"
    tutor dev launch -I
    
  2. Enable tagging and use of the Course Authoring MFE by enabling these waffle flags for "everyone":

    contentstore.new_studio_mfe.use_tagging_taxonomy_list_page
    contentstore.new_studio_mfe.use_new_home_page
    contentstore.new_studio_mfe.use_new_course_outline_page
    contentstore.new_studio_mfe.use_new_unit_page
    
  3. Set up some sample taxonomies, orgs, and courses.

    E.g. I have mounted edx-platform, and so can do this:

    cd edx-platform
    git clone https://github.com/open-craft/taxonomy-sample-data
    cd taxonomy-sample-data
    

    Update generate.py to provide a valid USER_EMAIL and TAXONOMY_SAMPLE_PATH, e.g

    --- a/generate.py
    +++ b/generate.py
    @@ -200,12 +200,12 @@ def import_tarfile_in_course(tarfile_path, course_key, user_id):
    
     User = get_user_model()
    
    -USER_EMAIL = "[email protected]"
    +USER_EMAIL = "[email protected]"
    
     user = User.objects.get(email=USER_EMAIL)
    
     # Set to path where repo was cloned, eg: /edx/src/taxonomy-sample-data
    -TAXONOMY_SAMPLE_PATH = "/openedx/taxonomy-sample-data"
    +TAXONOMY_SAMPLE_PATH = "/openedx/edx-platform/taxonomy-sample-data"

    Run the script in the CMS shell:

    tutor dev run cms bash
    app@2e5e9f0f4f4e:~/edx-platform$ ./manage.py cms shell < taxonomy-sample-data/generate.py
    

Testing:

  1. Navigate to the Course Authoring MFE and locate a tagged course, e.g.

    http://apps.local.edly.io:2001/course-authoring/course/course-v1:SampleTaxonomyOrg1+STC1+2023_1

  2. Navigate to a unit in this course, and click Manage Tags in the right-hand sidebar to Edit Tags and add tags to one or more units.

  3. Publish this course.

  4. Visit Superset as a superuser.

  5. Check that the course_overviews sink contains tag data in its course_data_json, e.g.

    {
       "advertised_start": null, 
       "announcement": null,
       "lowest_passing_grade": 0.5,
       "invitation_only": false,
       "max_student_enrollments_allowed": null,
       "effort": null,
       "enable_proctored_exams": false,
       "entrance_exam_enabled": false,
       "external_id": null,
       "language": "en",
       "tags": "['HierarchicalTaxonomy=hierarchical taxonomy tag 2.5.14', 'HierarchicalTaxonomy=hierarchical taxonomy tag 3.13.28', 'HierarchicalTaxonomy=hierarchical taxonomy tag 3.6.45', 'ESDC Skills and Competencies=Building and Construction', 'Lightcast Open Skills Taxonomy=Stream Processing', 'FlatTaxonomy=flat taxonomy tag 2440']",
    }
    
  6. Check that the course_blocks sink contains tag data in its block_data_json, e.g.

    {
       "course": "STC1",
       "run": "2023_1",
       "block_type": "about",
       "detached": 1,
       "graded": 0,
       "completion_mode": "",
       "section": 2,
       "subsection": 1,
       "unit": 5,
       "tags": "['ESDC Skills and Competencies=Endurance', 'ESDC Skills and Competencies=Physical Strength Abilities']",
    }
    

Author Notes & Concerns

  1. There's a bug in the Course Authoring MFE that prevents the Manage Tags UI from saving tags for units (step 2 under Testing). Branch that fixes this: TBD. fixed by Bump openedx-learning to support tagging with multiple taxonomies at once [FC-0036] edx-platform#34490.
  2. The Manage Tags UI shows a tag count that includes explicit + implicit tags. When tags from a hierarchical taxonomy are serialized for the course/block, only the explicit "leaf" tags are stored, so the counts will differ. 6055922 ensures both implicit and explicit tags are serialized to ClickHouse.
  3. Do we need to update the aspects migrations to add a tags field to the materialized views like we do for other block_data_json fields?

Merge checklist:
Check off if complete or not applicable:

  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Fixup commits are squashed away
  • Unit tests added/updated
  • Manual testing instructions provided
  • Noted any: Concerns, dependencies, migration issues, deadlines, tickets

Tags are stored on the course_block_json/block_data_json, serialized as
a list of "tag name=tag value" strings.

If the course/block has no tags, its json "tags" value will be null.
@openedx-webhooks
Copy link

openedx-webhooks commented May 1, 2024

Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label May 1, 2024
Copy link

github-actions bot commented May 1, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  platform_plugin_aspects
  utils.py
  platform_plugin_aspects/sinks
  course_overview_sink.py
  serializers.py
Project Total  

This report was generated by python-coverage-comment-action

@bmtcril
Copy link
Contributor

bmtcril commented May 1, 2024

Awesome, I hope to have time to test this tomorrow. From the examples it looks like hierarchical taxonomies don't attach each parent tag in the hierarchy, just the leaf node. Is that correct? I imagine people will want to be able to "drill up" as it were, which we wouldn't be able to do on the reporting side without having all of the taxonomies imported.

@pomegranited
Copy link
Contributor Author

@bmtcril No worries -- I can ensure that the "implicit" parent tags are also exported here.

I'd also appreciate your feedback on how the tags are being stored in Clickhouse.. is that flat list of "taxonomy name=tag value" OK? It's repetitive, in that the "taxonomy name" is repeated for each tag, but I figured it would be easier to query on than some nested structure.

Unfortunately the tagging API doesn't have a single-query mechanism for
returning all explicit and implicit tags for all blocks in a course, so
we need to query once per block.
@pomegranited pomegranited marked this pull request as ready for review May 5, 2024 23:45
@pomegranited pomegranited requested review from bmtcril and Ian2012 May 5, 2024 23:45
@pomegranited
Copy link
Contributor Author

@bmtcril CC @Ian2012 This is ready for review now -- I've tested it end to end in my tutor dev stack, and it's working now.

Do we need to update the aspects migrations to add a tags field to the materialized views like we do for other block_data_json fields?

Open question :)

@Ian2012
Copy link
Contributor

Ian2012 commented May 6, 2024

@pomegranited There is no need to do a migration as all MVs are managed in DBT, however that shouldn't be a concern here

Comment on lines 283 to 290
def _get_object_tags(usage_key): # pragma: no cover
"""
Wrap the Open edX tagging API method get_object_tags.
"""
# pylint: disable=import-outside-toplevel,import-error
from openedx.core.djangoapps.content_tagging.api import get_object_tags

return get_object_tags(object_id=str(usage_key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assume this could fail in case of changes to the location of the function or in the case it doesn't exist (older versions than Redwood)

The function get_model from the utils.py module is a good pattern to follow on those cases, and we should assume an import can always be None and handle those cases. Is the operators and developers function to configure it according to their custom developments and backports.

We can rename it to something more useful.

@bmtcril
Copy link
Contributor

bmtcril commented May 6, 2024

I was able to test this locally, thanks for the excellent directions! I think that having a dictionary for the output makes a little more sense. Aside from the storage I'm already worried that someone will name a tag with an "=" in it or something someday. We're already parsing out the JSON from there, a little more hopefully won't hurt.

I also wrapped the platform import in a try, since this may be used in earlier versions than Redwood. We haven't tested that in a while, however, and probably should.

@bmtcril
Copy link
Contributor

bmtcril commented May 6, 2024

I also agree that we don't need a migration at this point, since we're just capturing the data for v1. We can add dbt models for it when we know more about how we want to query these.

@bmtcril
Copy link
Contributor

bmtcril commented May 6, 2024

Screenshots of the test...

Screenshot 2024-05-06 at 10 47 12 AM

And the resulting JSON in ClickHouse:

Screenshot 2024-05-06 at 10 48 32 AM

Copy link
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM in the current state, but some of the changes are mine so I'll wait for a thumb or more feedback from @Ian2012 as I think I addressed his comments with my changes?

@Ian2012
Copy link
Contributor

Ian2012 commented May 6, 2024

@bmtcril Yes, you addressed my concerns here. Just one more comment (not a blocker), does it makes sense to have the nested structured saved?

Instead of

{
   "tags": {
      "x": [
        "y", "z"
       ]
   }
}

use:

{
   "tags": {
      "x": [
        { "y": {"z": null}}
       ]
   }
}

It would be way harder to query this, just wanted to make sure we are not losing any context here.

@bmtcril
Copy link
Contributor

bmtcril commented May 6, 2024

@Ian2012 that's a good point, but I think we probably don't need that complexity here. When it comes time to start reporting on these we'll need to make a decision about whether to import the whole set of tags to Aspects as well, and the hierarchy can come with it at that point. Otherwise I think this is sufficient to support the filtering cases we've talked about so far. 🤞

@bmtcril bmtcril merged commit 925210b into main May 6, 2024
10 checks passed
@bmtcril bmtcril deleted the jill/sync-tags-on-course-sink branch May 6, 2024 17:13
@openedx-webhooks
Copy link

@pomegranited 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Capture tags in ClickHouse on course publish
4 participants