Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add validation step for strain configuration schema and repository URLs #35

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

magajh
Copy link

@magajh magajh commented Oct 22, 2024

Summary

This PR introduces a new step, Validate strain configuration, which includes two Python scripts to validate the strain configuration file. These scripts ensure that the configuration adheres to the required schema and that repo URLs are valid. The goal is to catch errors early, allowing the workflow to display any issues immediately, rather than encountering errors much later in the build process.

Execution

The validation step sequentially runs both scripts:

python $STRAIN_VALIDATION_SCRIPTS_PATH/validate_schema.py $CONFIG_FILE
python $STRAIN_VALIDATION_SCRIPTS_PATH/validate_repos.py $CONFIG_FILE

Schema validation is performed first to catch any missing keys or format errors before proceeding with other checks.

How to Test

  1. Go to the ednx-strains repository, which hosts the caller workflow for Picasso: https://github.com/eduNEXT/ednx-strains/actions

  2. Navigate to Actions > Build Open edX strain and enter the required configuration to manually trigger the workflow.

  3. Under Use workflow from, select the branch magagio/build-workflow-syntax-verification.

  4. In Branch to clone the strain from, you can either use the same branch or create a new one with modifications to the Redwood strain. Examples of modifications to test error handling:

    • Use an incorrect repository or version for a private package or an extra pip requirement.
    • Set an incorrectly formatted version for Tutor.
    • Remove a required key in PICASSO_THEMES.
  5. Click Run workflow.

  6. Verify that the workflow fails at the strain validation step and displays the corresponding error message.

Example Workflow Runs

  • Non-existent branch in a private requirement
    Run example

  • Missing version key in PICASSO_THEMES (e.g., edunext/redwood.master)
    Run example

Exclusions

  • Repository validation in PICASSO_THEMES: This validation is skipped as it fails clearly within 57 seconds if incorrect.
    Run example

  • Validation of repositories in private packages: These also fail in around one minute if incorrect.
    Run example

@magajh magajh force-pushed the mjh/repo-and-syntx-analysis branch 4 times, most recently from 20edbcc to 9f3e3ea Compare November 8, 2024 08:49
@magajh magajh force-pushed the mjh/repo-and-syntx-analysis branch from bf9432b to d74f046 Compare November 8, 2024 14:03
@magajh magajh changed the title feat: add syntax and url verifications feat: add validation step for strain configuration schema and repository URLs Nov 8, 2024
@magajh magajh marked this pull request as ready for review November 8, 2024 14:34
@magajh magajh requested a review from a team as a code owner November 8, 2024 14:34
Comment on lines +50 to +52
"TUTOR_VERSION": And(
str,
Regex(TUTOR_VERSION_REGEX, error="TUTOR_VERSION must be in the format vX.Y.Z (e.g., v5.3.0)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, TUTOR_VERSION should not include the v-prefix. See this issue for more context: #20

Comment on lines +54 to +81
Optional(Regex(r"^PICASSO_.+_DPKG$")): {
"name": And(str, len),
"repo": And(str, len),
"version": And(str, len)
},
Optional("PICASSO_THEMES"): And(
list,
len,
[
{
"name": And(str, len),
"repo": And(str, len),
"version": And(str, len)
}
]
),
Optional("PICASSO_THEMES_NAME"): And(
list,
len,
lambda x: all(isinstance(item, str) and item for item in x),
error=EMPTY_LIST_ERROR_MSG.format("PICASSO_THEMES_NAME"),
),
Optional("PICASSO_THEME_DIRS"): And(
list,
len,
lambda x: all(isinstance(item, str) and item for item in x),
error=EMPTY_LIST_ERROR_MSG.format("PICASSO_THEME_DIRS"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression this could be done in tutor-contrib-picasso instead when enabling themes or private packages, was that an approach considered during the implementation? Or is this something that can only be done during the building process and not when executing extra-commands?

Comment on lines +1 to +3
"""
Validate repository URLs and configurations in the strain config file.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add here how this script it's supposed to be used? As we've done here: https://github.com/eduNEXT/picasso/blob/main/.github/workflows/scripts/get_tutor_config.py#L1-L15

edx_platform_version = data.get('EDX_PLATFORM_VERSION', "")
url = f"{edx_platform_repository}/tree/{edx_platform_version}"
if not validate_repo_url(url):
LOG.error("Failed to validate edx_platform_repository URL: %s", url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.error("Failed to validate edx_platform_repository URL: %s", url)
LOG.error("Failed to validate EDX_PLATFORM_REPOSITORY URL: %s", url)

How do we know if the EDX_PLATFORM_REPOSITORY exists but not the EDX_PLATFORM_VERSION?

data = yaml.safe_load(strain_file)
except yaml.YAMLError:
LOG.exception("Error loading YAML data.")
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 1
return EXIT_FAILURE

return 1

LOG.info("All repository URLs validated successfully.")
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 0
return EXIT_SUCCESS

Comment on lines +43 to +45
if "PICASSO_THEMES" in data and not ("PICASSO_THEMES_NAME" in data or "PICASSO_DEFAULT_SITE_THEME" in data):
raise SchemaError("PICASSO_THEMES_NAME or PICASSO_DEFAULT_SITE_THEME must be defined.")
return data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be checkec using the schema somehow?

try:
data = yaml.safe_load(strain_file)
except yaml.YAMLError:
LOG.exception("Error loading YAML data.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you're using logs for getting more details about failures. I'll see how we can fit this into the other scripts. Thank you!

@@ -73,14 +74,21 @@ jobs:

- name: Install necessary dependencies
run: |
pip install pyyaml
pip install pyyaml schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to merge #30 so we can use the requirements file instead. I'll put a pin on this so I don't forget.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariajgrimaldi agree, I left my approval in #30 so we can close it

Comment on lines +34 to +41
Args:
data (Dict[str, Any]): The YAML data dictionary.

Returns:
Dict[str, Any]: Validated data dictionary.

Raises:
SchemaError: If required conditions are not met.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns:
int: 0 if configuration file is valid; 1 if invalid.
"""
parser = argparse.ArgumentParser(description="Validate YAML file syntax and strain schema.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we validate the arguments in another function similar to how it is done here?

int: 0 if configuration file is valid; 1 if invalid.
"""
parser = argparse.ArgumentParser(description="Validate YAML file syntax and strain schema.")
parser.add_argument("file", type=argparse.FileType("r"), nargs="+", help="YAML file to validate.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the nargs=+ necessary? I understand that only the config.yml file will be validated.

int: 0 if all URLs are validated successfully, 1 if any validation fails.
"""
parser = argparse.ArgumentParser(description="Validate repository URLs in strain config file.")
parser.add_argument("file", type=argparse.FileType("r"), nargs="+")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as I comment here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants