Skip to content

Customize Workflows

Nyckollas Brandão edited this page Jun 30, 2023 · 19 revisions

Customize Workflows

The PHYLOViZ Web Platform provides the ability to customize workflows, allowing users to define and configure procedures tailored to their specific needs. Workflows are composed of tasks or steps that are executed sequentially, with outputs from one task serving as inputs for subsequent tasks. This section explains the different components of workflows and how they can be customized.

For more information on the documents used for workflows, check Data Model - Workflow documents. Knowledge of these is essential to understand how to configure them.

Configuration

By providing configuration flexibility, the PHYLOViZ Web Platform enables users to efficiently customize and adapt workflows to meet their specific requirements.

Customizing workflows, whether it involves adding, removing, or editing elements, requires configuring the following:

  • Workflow templates: These templates define the tasks executed, the corresponding commands to be run, and the arguments received by each task.
  • Tool templates: These templates specify the Docker image associated with each tool and other tool-related configurations, e.g. network mode.
  • Tools: Certain tools themselves may require configuration to seamlessly integrate workflows into the application. These tools are auxiliary tools that handle data input for algorithms or manage application metadata. For example, in our implementation, the "downloader" tool is responsible for downloading files such as typing data from an S3 bucket into the workflow's Docker volume, enabling subsequent tasks to use them as local files.

In the following sections, we will outline the steps for customizing workflows in different scenarios and provide illustrative examples.

Adding a New Workflow

To add a new workflow, several steps are required. As an example, let's add a new workflow that calculates two trees, each using a different algorithm:

  1. Workflow Templates:

    1.1. Create a new workflow template: Assign a unique type to the template and provide a name and description to distinguish it from other workflows.

    For this example, type is "compute-two-trees", name is "Compute Two Trees" and description is "Computes two trees from a single distance matrix, each with a different algorithm.".

    "type": "compute-two-trees",
    "name": "Compute Two Trees",
    "description": "Computes two trees from a single distance matrix
    

    1.2. Add the input argument schema of the workflow to the template: Define the arguments and their types. This step ensures that badly formatted arguments are filtered out during workflow creation. These arguments are directly inputted into the command arguments of the tools, replacing placeholders.

    For this example, we set an argument schema containing arguments: one to identify the distance matrix to create the trees from; one to identify the dataset; two arguments to specify the algorithm to be used for each of the trees.

    "arguments": {
      "distanceMatrixId": { "type": "distanceMatrixId" },
      "datasetId": { "type": "datasetId" },
      "firstTreeAlgorithm": {
        "type": "string",
        "allowedValues": [(...)]
      },
      "secondTreeAlgorithm": {
        "type": "string",
        "allowedValues": [(...)]
       }
    }
    

    1.3. Add all the necessary tasks to the template: Include each task with the appropriate tool (referencing its tool template) and the corresponding command to execute. Each command may contain placeholders that are replaced with input arguments during workflow creation. Specify the order of execution by including the name of the task to run afterward in the "children" field of the preceding task.

    For this example, this includes: one task for downloading the distance matrix; two tasks, each for calculating a tree using a different algorithm based on the arguments (firstTreeAlgorithm and secondTreeAlgorithm); two tasks, one for each tree, to upload the trees to S3 and create their metadata, which includes the algorithm used.

    "tasks": [
    {
      "taskId": "downloadDistanceMatrix",
      "tool": "downloader",
      "action": {
        "command": "--project-id=${projectId} --dataset-id=${datasetId} --resource-id=${distanceMatrixId} --resource-type=distance-matrix --workflow-id=${workflowId} --out=/phyloviz-web-platform/distance_matrix.txt"
      },
      "children": [
        "firstTreeCalculation"
      ]
    },
    {
      "taskId": "firstTreeCalculation",
      "tool": "phylolib",
      "action": {
        "command": "algorithm ${algorithm} --matrix=symmetric:/phyloviz-web-platform/distance_matrix.txt --out=newick:/phyloviz-web-platform/tree1.txt"
      },
      "children": [
        "secondTreeCalculation"
      ]
    },
    {
      "taskId": "secondTreeCalculation",
      "tool": "phylolib",
      "action": {
        "command": "algorithm ${algorithm} --matrix=symmetric:/phyloviz-web-platform/distance_matrix.txt --out=newick:/phyloviz-web-platform/tree2.txt"
      },
      "children": [
        "firstTreeUpload"
      ]
    },
    {
      "taskId": "firstTreeUpload",
      "tool": "uploader",
      "action": {
        "command": " --file-path=/phyloviz-web-platform/tree1.txt --project-id=${projectId} --dataset-id=${datasetId} --workflow-id=${workflowId} -- resource-type=tree --source-type=algorithm-distance-matrix --algorithm=${firstTreeAlgorithm} --distance-matrix-id=${distanceMatrixId} --parameters={}"
      },
      "children": [
        "secondTreeUpload"
      ]
    },
    {
      "taskId": "secondTreeUpload",
      "tool": "uploader",
      "action": {
        "command": " --file-path=/phyloviz-web-platform/tree2.txt --project-id=${projectId} --dataset-id=${datasetId} --workflow-id=${workflowId} -- resource-type=tree --source-type=algorithm-distance-matrix --algorithm=${secondTreeAlgorithm} --distance-matrix-id=${distanceMatrixId} --parameters={}"
      }
    }
    
  2. Tool Templates: Create tool templates for any new tools: Specify the tool name, which is used in the tasks of the workflow template, and the Docker image associated with the tool. This step may not be necessary if all the required tools already exist. All tool templates already exist for the example, but here is an example of the creation of a tool template for the tool saffrontree:

    {
      "general": {
        "name": "saffrontree",
        "description": "The saffrontree tool"
      },
      "access": {
        "_type": "library",
        "details": {
          "address": "localhost",
          "dockerUrl": "unix://var/run/docker.sock",
          "dockerImage": "sangerpathogens/saffrontree",
          "dockerAutoRemove": "never",
          "dockerNetworkMode": "bridge",
          "dockerApiVersion": "auto",
          "dockerVolumes": [
            {
              "source": "/mnt/phyloviz-web-platform/${projectId}/${workflowId}/",
              "target": "/phyloviz-web-platform",
              "_type": "bind"
            }
          ]
        }
      },
      "library": []
    }
    
  3. Tools: Ensure the tools meet the desired requirements: Verify that the tools can understand the specified command arguments in the tasks and perform their intended functions. Pay special attention to auxiliary tools, as they are responsible for reading and writing resource metadata. If desired, these tools should not only create metadata but also specify the source of the resource in the metadata, such as the algorithm used.

    If the required tools do not exist, they need to be created along with their tool templates. If the existing tools do not meet the desired requirements, they should be edited accordingly.

    In this example, let's assume that the required tools already exist. However, for the purpose of explanation, let's say we used a single task of the "uploader" tool instead of two tasks. In that case, the "uploader" tool would need to be modified to handle the arguments for each of the different trees. If a custom tool with the same functionality were used, a tool template for it would need to be created.

Editing Existing Workflows

There are multiple possibilities for making changes to existing workflows. For example, let's consider the scenario where you want to support a new algorithm for tree computation.

If the library you are using already includes the desired algorithm and your existing workflow is designed to work generically for any algorithm, you may only need to modify the input argument schema to include the new algorithm as an allowed value.

{
  "type": "compute-tree",
  "name": "Compute Tree",
  "description": "Computes a tree, given an existing distance matrix of the dataset and the tree calculation algorithm.",
  "arguments": {
    "datasetId": {
      "type": "datasetId"
    },
    "distanceMatrixId": {
      "type": "distanceMatrixId"
    },
    "algorithm": {
    "type": "string",
    "allowed-values": [
      "goeburst",
      "edmonds",
      "sl",
      // Add new algorithm (new allowed value)
    ]
  }
}

You may also add new arguments related to the parametrization of the algorithm, and in that case, making use of the "required" field of the argument, to specify that it's not required for all instances of the workflow (as other algorithms may not use this new parameter); and also make use of the "prefix" field, and add it to the command line of the task.

However, if the library you are using does not have the desired algorithm in its current version, simply modifying the argument schema to accept that algorithm will not suffice. In such cases, you may need to consider the following options:

  1. If a new version of the library is available and it includes the desired algorithm, you can update the tool template for that library by changing the Docker image (if necessary).
  2. If the algorithm is part of another library or you want to use a different library altogether, you must create a new tool template for the new library. Additionally, you would need to modify the existing workflow template to use the new library instead. It's important to note that if the new library does not include all the algorithms from the previous library or if it performs worse in terms of performance, you can create a separate workflow specifically for the new library while keeping the existing one intact. This approach ensures continued support for all algorithms at all times.

Removing Workflows

Removing a workflow is as simple as deleting its corresponding workflow template.

Frontend Changes

Configuring workflows may result in the deprecation of certain views in the frontend application. Therefore, it is necessary to make changes to the frontend to accommodate the modifications to the workflows. This involves creating or editing views that effectively call these workflows, with a particular focus on parametrization, which is closely tied to the chosen UI library, MUI, and its input components.

The specific changes required in the frontend depend on the nature of the workflow configuration:

  • If a new workflow is added (i.e., a new workflow template), the corresponding operation and/or view should be implemented to invoke it.
  • If an existing workflow is renamed, the frontend should reflect the updated name.
  • If the input parameters of an existing workflow are modified, the views should be updated accordingly to accommodate the changes.
  • If tasks are modified within a workflow, no further action is required in the frontend.
Clone this wiki locally