Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum with colon symbols cause URI parsing error #2071

Open
fmigneault opened this issue Nov 21, 2024 · 10 comments
Open

Enum with colon symbols cause URI parsing error #2071

fmigneault opened this issue Nov 21, 2024 · 10 comments

Comments

@fmigneault
Copy link
Contributor

Expected Behavior

The part of interest as shown below (same thing happens whether encoded as JSON or YAML), should not cause any error.

  - type:
      type: enum
      symbols:
        - 00:00
        - 01:00
        - 02:00
        - 03:00
        - 04:00
        - 05:00
        - 06:00
        - 07:00
        - 08:00
        - 09:00
        - 10:00
        - 11:00
        - 12:00
        - 13:00
        - 14:00
        - 15:00
        - 16:00
        - 17:00
        - 18:00
        - 19:00
        - 20:00
        - 21:00
        - 22:00
        - 23:00
    id: time

Actual Behavior

The following is raised.

[...]
URI prefix '20' of '20:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '20' of '20:00' not recognized, are you missing a $namespaces section?
URI prefix '21' of '21:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '21' of '21:00' not recognized, are you missing a $namespaces section?
URI prefix '22' of '22:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '22' of '22:00' not recognized, are you missing a $namespaces section?
URI prefix '23' of '23:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '23' of '23:00' not recognized, are you missing a $namespaces section?
INFO package.yml:1:1: Unknown hint https://schemas.crim.ca/cwl/weaver#OGCAPIRequirement
ERROR Tool definition failed validation:
package.yml:6:1: Type property {'type': 'enum', 'symbols': ['00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00'], 'name': 'timece7ef3bf-8a50-4818-9473-0f36fe26fcfa'}
                 not a valid Avro schema: Duplicate symbol: ['00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',
                 '00', '00', '00', '00', '00']

Somehow, when the loading operation reaches this step:

workflowobj = cast(
CommentedMap,
loadingContext.loader.fetch(fileuri, content_types=CWL_CONTENT_TYPES),
)

Schema-Salad does in-memory resolution operation that attempts parsing each symbol to inject the relevant input-id prefix URI. This causes the HH:MM values to be converted as follows with an invalid assumption that the : represents a namespace reference (as if cwl:something was used).

{4643AB99-BCF5-4D26-933B-7DDC33C32EB4}

Literal strings under the enum containing : should not be mishandled this way. Users should not have to work around the tool to inject some escape mechanism. cwltool should handle this transparently.

Workflow Code

cwlVersion: v1.0
class: CommandLineTool
hints:
  weaver:OGCAPIRequirement:
    process: https://cds.climate.copernicus.eu/api/retrieve/v1/processes/reanalysis-era5-land-monthly-means
inputs:
  - type:
      type: enum
      symbols:
        - monthly_averaged_reanalysis
        - monthly_averaged_reanalysis_by_hour_of_day
    id: product_type
  - type:
      type: enum
      symbols:
        - 10m_u_component_of_wind
        - 10m_v_component_of_wind
        - 2m_dewpoint_temperature
        - 2m_temperature
        - evaporation_from_bare_soil
        - evaporation_from_open_water_surfaces_excluding_oceans
        - evaporation_from_the_top_of_canopy
        - evaporation_from_vegetation_transpiration
        - forecast_albedo
        - lake_bottom_temperature
        - lake_ice_depth
        - lake_ice_temperature
        - lake_mix_layer_depth
        - lake_mix_layer_temperature
        - lake_shape_factor
        - lake_total_layer_temperature
        - leaf_area_index_high_vegetation
        - leaf_area_index_low_vegetation
        - potential_evaporation
        - runoff
        - skin_reservoir_content
        - skin_temperature
        - snow_albedo
        - snow_cover
        - snow_density
        - snow_depth
        - snow_depth_water_equivalent
        - snow_evaporation
        - snowfall
        - snowmelt
        - soil_temperature_level_1
        - soil_temperature_level_2
        - soil_temperature_level_3
        - soil_temperature_level_4
        - sub_surface_runoff
        - surface_latent_heat_flux
        - surface_net_solar_radiation
        - surface_net_thermal_radiation
        - surface_pressure
        - surface_runoff
        - surface_sensible_heat_flux
        - surface_solar_radiation_downwards
        - surface_thermal_radiation_downwards
        - temperature_of_snow_layer
        - total_evaporation
        - total_precipitation
        - volumetric_soil_water_layer_1
        - volumetric_soil_water_layer_2
        - volumetric_soil_water_layer_3
        - volumetric_soil_water_layer_4
    id: variable
  - type:
      type: enum
      symbols:
        - "1950"
        - "1951"
        - "1952"
        - "1953"
        - "1954"
        - "1955"
        - "1956"
        - "1957"
        - "1958"
        - "1959"
        - "1960"
        - "1961"
        - "1962"
        - "1963"
        - "1964"
        - "1965"
        - "1966"
        - "1967"
        - "1968"
        - "1969"
        - "1970"
        - "1971"
        - "1972"
        - "1973"
        - "1974"
        - "1975"
        - "1976"
        - "1977"
        - "1978"
        - "1979"
        - "1980"
        - "1981"
        - "1982"
        - "1983"
        - "1984"
        - "1985"
        - "1986"
        - "1987"
        - "1988"
        - "1989"
        - "1990"
        - "1991"
        - "1992"
        - "1993"
        - "1994"
        - "1995"
        - "1996"
        - "1997"
        - "1998"
        - "1999"
        - "2000"
        - "2001"
        - "2002"
        - "2003"
        - "2004"
        - "2005"
        - "2006"
        - "2007"
        - "2008"
        - "2009"
        - "2010"
        - "2011"
        - "2012"
        - "2013"
        - "2014"
        - "2015"
        - "2016"
        - "2017"
        - "2018"
        - "2019"
        - "2020"
        - "2021"
        - "2022"
        - "2023"
        - "2024"
    id: year
  - type:
      type: enum
      symbols:
        - "01"
        - "02"
        - "03"
        - "04"
        - "05"
        - "06"
        - "07"
        - "08"
        - "09"
        - "10"
        - "11"
        - "12"
    id: month
  - type:
      type: enum
      symbols:
        - 00:00
        - 01:00
        - 02:00
        - 03:00
        - 04:00
        - 05:00
        - 06:00
        - 07:00
        - 08:00
        - 09:00
        - 10:00
        - 11:00
        - 12:00
        - 13:00
        - 14:00
        - 15:00
        - 16:00
        - 17:00
        - 18:00
        - 19:00
        - 20:00
        - 21:00
        - 22:00
        - 23:00
    id: time
  - type:
      type: array
      items: float
    id: area
  - type:
      type: enum
      symbols:
        - grib
        - netcdf
    default:
      - grib
    id: data_format
  - type:
      type: enum
      symbols:
        - zip
        - unarchived
    default:
      - unarchived
    id: download_format
outputs:
  - type: File
    format: iana:application/json
    outputBinding:
      glob: '*.json'
    id: asset
$namespaces:
  iana: https://www.iana.org/assignments/media-types/
  weaver: https://schemas.crim.ca/cwl/weaver#

Full Traceback

[2024-11-20 19:25:58,696] ERROR    [MainThread][weaver.processes.wps_package] ../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']
Traceback (most recent call last):
  File "schema_salad/avro/schema.py", line 307, in __init__
  File "schema_salad/avro/schema.py", line 727, in make_avsc_object
  File "schema_salad/avro/schema.py", line 375, in __init__
schema_salad.avro.schema.AvroException: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 667, in __init__
    make_avsc_object(convert_to_dict(self.inputs_record_schema), self.names)
  File "schema_salad/avro/schema.py", line 735, in make_avsc_object
  File "schema_salad/avro/schema.py", line 656, in __init__
  File "schema_salad/avro/schema.py", line 627, in make_field_objects
  File "schema_salad/avro/schema.py", line 309, in __init__
schema_salad.avro.schema.SchemaParseException: Type property {'type': 'enum', 'symbols': ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00'], 'name': 'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid Avro schema: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1240, in try_or_raise_package_error
    return call()
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1268, in <lambda>
    lambda: _load_package_content(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 696, in _load_package_content
    package = factory.make(tmp_json_cwl)  # type: CWLFactoryCallable
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/factory.py", line 67, in make
    load = load_tool.load_tool(cwl, self.loading_context)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 621, in load_tool
    return make_tool(uri, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 598, in make_tool
    tool = loadingContext.construct_tool_object(processobj, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/workflow.py", line 48, in default_make_tool
    return command_line_tool.CommandLineTool(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 412, in __init__
    super().__init__(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 662, in __init__
    with SourceLine(toolpath_object, "inputs", ValidationException, debug):
  File "schema_salad/sourceline.py", line 249, in __exit__
schema_salad.exceptions.ValidationException: ../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']
[2024-11-20 19:25:58,698] ERROR    [MainThread][weaver.processes.utils] Invalid package/reference definition. Loading generated error: [Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']].]
Traceback (most recent call last):
  File "schema_salad/avro/schema.py", line 307, in __init__
  File "schema_salad/avro/schema.py", line 727, in make_avsc_object
  File "schema_salad/avro/schema.py", line 375, in __init__
schema_salad.avro.schema.AvroException: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 667, in __init__
    make_avsc_object(convert_to_dict(self.inputs_record_schema), self.names)
  File "schema_salad/avro/schema.py", line 735, in make_avsc_object
  File "schema_salad/avro/schema.py", line 656, in __init__
  File "schema_salad/avro/schema.py", line 627, in make_field_objects
  File "schema_salad/avro/schema.py", line 309, in __init__
schema_salad.avro.schema.SchemaParseException: Type property {'type': 'enum', 'symbols': ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00'], 'name': 'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid Avro schema: Duplicate symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1240, in try_or_raise_package_error
    return call()
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1268, in <lambda>
    lambda: _load_package_content(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 696, in _load_package_content
    package = factory.make(tmp_json_cwl)  # type: CWLFactoryCallable
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/factory.py", line 67, in make
    load = load_tool.load_tool(cwl, self.loading_context)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 621, in load_tool
    return make_tool(uri, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/load_tool.py", line 598, in make_tool
    tool = loadingContext.construct_tool_object(processobj, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/workflow.py", line 48, in default_make_tool
    return command_line_tool.CommandLineTool(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 412, in __init__
    super().__init__(toolpath_object, loadingContext)
  File "/home/francis/dev/conda/envs/weaver/lib/python3.10/site-packages/cwltool/process.py", line 662, in __init__
    with SourceLine(toolpath_object, "inputs", ValidationException, debug):
  File "schema_salad/sourceline.py", line 249, in __exit__
schema_salad.exceptions.ValidationException: ../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/francis/dev/weaver/weaver/processes/utils.py", line 286, in _validate_deploy_process_info
    info = get_process_definition(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1267, in get_process_definition
    package_factory, process_type, _ = try_or_raise_package_error(
  File "/home/francis/dev/weaver/weaver/processes/wps_package.py", line 1247, in try_or_raise_package_error
    raise exc_type(f"Invalid package/reference definition. {reason} generated error: [{exc!s}].")
weaver.exceptions.PackageRegistrationError: Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00', '00', '00', '00', '00', '00', '00', '00',
                                           '00', '00']].
[2024-11-20 19:25:58,699] DEBUG    [MainThread][weaver.tweens] http exception -> ows exception response.
[2024-11-20 19:25:58,699] WARNING  [MainThread][weaver.tweens] Handled request exception:
  Cause: [POST http://localhost:4002/processes]
  Error: [(HTTPUnprocessableEntity) <422> Invalid package/reference definition. Loading generated error: [Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00']].]]
[2024-11-20 19:25:58,700] DEBUG    [MainThread][weaver.tweens] Handled request details:
(HTTPUnprocessableEntity) <422> Invalid package/reference definition. Loading generated error: [Invalid package/reference definition. Loading package content generated error: [../../../../tmp/tmpczjpv51r/package:1:202: Type property {'type': 'enum', 'symbols': ['00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00'], 'name':
                                           'timebc5ec20a-5ecf-4713-a612-acb8f9bf120b'} not a valid
                                           Avro schema: Duplicate symbol: ['00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00', '00', '00', '00', '00', '00', '00', '00',                                            '00', '00']].]

Your Environment

  • cwltool version: main branch
@fmigneault
Copy link
Contributor Author

@mr-c @tetron
Any guidance how to work around this issue (either relevant escape mechanism or where to fix the code) would be greatly appreciated.
This is blocking me heavily since most geospatial services have time attributes defined in similar ways to limit the amount of processing and available data requests.

I would be happy to open a PR to address wherever the issue occurs, but I'm having trouble even figuring out where that happens because of how schema-salad dynamically applies the URI resolution.

@tetron
Copy link
Member

tetron commented Nov 21, 2024

@fmigneault looking into it

@mr-c
Copy link
Member

mr-c commented Nov 21, 2024

@fmigneault with a recent cwltool, running cwltool --validate works with your example.

Try upgrading, and if that doesn't work, then try --skip-schemas and/or --non-strict

@mr-c
Copy link
Member

mr-c commented Nov 21, 2024

Though I don't personally find that to be a valid CommandLineTool; there is no baseCommand (or, but this is deprecated, a software container with an ENTRYPOINT)

@tetron
Copy link
Member

tetron commented Nov 21, 2024

Yes, what version of cwltool and schema-salad are you using? I can't find an obvious place in the current code where it would drop the suspected prefix entirely.

@fmigneault
Copy link
Contributor Author

I'm actually on https://github.com/fmigneault/cwltool/tree/fix-load-contents-array, but since #2036 has been merged with it, I can probably update to the actual latest and retry. schema-salad is 8.7.20240820070935

@fmigneault
Copy link
Contributor Author

fmigneault commented Nov 21, 2024

Reinstalled with pip install --force-reinstall cwltool.

Still an error for me.
I've tried all permutations of --skip-schemas and/or --non-strict with --validate.
Same output each time.

pip show cwltool
Name: cwltool
Version: 3.1.20241112140730
Summary: Common workflow language reference implementation
Home-page: https://github.com/common-workflow-language/cwltool
Author: Common workflow language working group
Author-email: [email protected]
License: 
Location: /home/francis/dev/miniconda/envs/weaver-py310/lib/python3.10/site-packages
Requires: argcomplete, coloredlogs, cwl-utils, mypy-extensions, prov, psutil, pydot, pyparsing, rdflib, requests, ruamel.yaml, schema-salad, spython
Required-by: weaver
pip show schema-salad
Name: schema-salad
Version: 8.7.20241021092521
Summary: Schema Annotations for Linked Avro Data (SALAD)
Home-page: https://schema-salad.readthedocs.io/
Author: Common workflow language working group
Author-email: [email protected]
License: Apache 2.0
Location: /home/francis/dev/miniconda/envs/weaver-py310/lib/python3.10/site-packages
Requires: CacheControl, mistune, mypy-extensions, rdflib, requests, ruamel.yaml
Required-by: cwl-upgrader, cwl-utils, cwltool, weaver
cwltool --validate test-tool.cwl
INFO /home/francis/dev/conda/envs/weaver/bin/cwltool 3.1.20241112140730
INFO Resolved 'test-tool.cwl' to 'file:///tmp/test-tool.cwl'
URI prefix '00' of '00:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '00' of '00:00' not recognized, are you missing a $namespaces section?
URI prefix '01' of '01:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '01' of '01:00' not recognized, are you missing a $namespaces section?
URI prefix '02' of '02:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '02' of '02:00' not recognized, are you missing a $namespaces section?
URI prefix '03' of '03:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '03' of '03:00' not recognized, are you missing a $namespaces section?
URI prefix '04' of '04:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '04' of '04:00' not recognized, are you missing a $namespaces section?
URI prefix '05' of '05:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '05' of '05:00' not recognized, are you missing a $namespaces section?
URI prefix '06' of '06:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '06' of '06:00' not recognized, are you missing a $namespaces section?
URI prefix '07' of '07:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '07' of '07:00' not recognized, are you missing a $namespaces section?
URI prefix '08' of '08:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '08' of '08:00' not recognized, are you missing a $namespaces section?
URI prefix '09' of '09:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '09' of '09:00' not recognized, are you missing a $namespaces section?
URI prefix '10' of '10:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '10' of '10:00' not recognized, are you missing a $namespaces section?
URI prefix '11' of '11:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '11' of '11:00' not recognized, are you missing a $namespaces section?
URI prefix '12' of '12:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '12' of '12:00' not recognized, are you missing a $namespaces section?
URI prefix '13' of '13:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '13' of '13:00' not recognized, are you missing a $namespaces section?
URI prefix '14' of '14:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '14' of '14:00' not recognized, are you missing a $namespaces section?
URI prefix '15' of '15:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '15' of '15:00' not recognized, are you missing a $namespaces section?
URI prefix '16' of '16:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '16' of '16:00' not recognized, are you missing a $namespaces section?
URI prefix '17' of '17:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '17' of '17:00' not recognized, are you missing a $namespaces section?
URI prefix '18' of '18:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '18' of '18:00' not recognized, are you missing a $namespaces section?
URI prefix '19' of '19:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '19' of '19:00' not recognized, are you missing a $namespaces section?
URI prefix '20' of '20:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '20' of '20:00' not recognized, are you missing a $namespaces section?
URI prefix '21' of '21:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '21' of '21:00' not recognized, are you missing a $namespaces section?
URI prefix '22' of '22:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '22' of '22:00' not recognized, are you missing a $namespaces section?
URI prefix '23' of '23:00' not recognized, are you missing a $namespaces section?
WARNING URI prefix '23' of '23:00' not recognized, are you missing a $namespaces section?
INFO test-tool.cwl:1:1: Unknown hint https://schemas.crim.ca/cwl/weaver#OGCAPIRequirement
ERROR Tool definition failed validation:
test-tool.cwl:6:1: Type property {'type': 'enum', 'symbols': ['00', '00', '00', '00', '00', '00',
                   '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',
                   '00', '00', '00', '00', '00'], 'name':
                   'timeb2626f50-7cd7-454f-a18e-8f727d619e72'} not a valid Avro schema: Duplicate
                   symbol: ['00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00',
                   '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00', '00']

@tetron
Copy link
Member

tetron commented Nov 21, 2024

@fmigneault

"pip show cwltool"
Location: /home/francis/dev/miniconda/envs/weaver-py310/lib/python3.10/site-packages

cwltool --validate test-tool.cwl
INFO /home/francis/dev/conda/envs/weaver/bin/cwltool 3.1.20241112140730

The first one is /home/francis/dev/miniconda/envs/weaver-py310 and the second one is /home/francis/dev/conda/envs/weaver ?

@fmigneault
Copy link
Contributor Author

fmigneault commented Nov 21, 2024

Don't know why they show differently, but they are system linked

  • conda -> miniconda
  • weaver -> weaver-py310
$(realpath $(which pip)) show cwltool
Name: cwltool
Version: 3.1.20241112140730
Summary: Common workflow language reference implementation
Home-page: https://github.com/common-workflow-language/cwltool
Author: Common workflow language working group
Author-email: [email protected]
License: 
Location: /home/francis/dev/miniconda/envs/weaver-py310/lib/python3.10/site-packages
Requires: argcomplete, coloredlogs, cwl-utils, mypy-extensions, prov, psutil, pydot, pyparsing, rdflib, requests, ruamel.yaml, schema-salad, spython
Required-by: weaver
$(realpath $(which pip)) show schema-salad
Name: schema-salad
Version: 8.7.20241021092521
Summary: Schema Annotations for Linked Avro Data (SALAD)
Home-page: https://schema-salad.readthedocs.io/
Author: Common workflow language working group
Author-email: [email protected]
License: Apache 2.0
Location: /home/francis/dev/miniconda/envs/weaver-py310/lib/python3.10/site-packages
Requires: CacheControl, mistune, mypy-extensions, rdflib, requests, ruamel.yaml
Required-by: cwl-upgrader, cwl-utils, cwltool, weaver

@fmigneault
Copy link
Contributor Author

fmigneault commented Nov 23, 2024

Ha! Found a workaround. Maybe cwltool could apply it automatically?

Add an explicit # before the values to emulate a URI that the parsing step will gladly accept and combine properly with the relative input-ID URI. Also, add quotes to avoid the YAML comment parsing.

  - type:
      type: enum
      symbols:
        - "#00:00"
        - "#01:00"
        - "#02:00"
        - "#03:00"
        - "#04:00"

The updated CWL with a few patches to make it work properly:

cwlVersion: v1.0
class: CommandLineTool
hints:
  weaver:OGCAPIRequirement:
    process: https://cds.climate.copernicus.eu/api/retrieve/v1/processes/reanalysis-era5-land-monthly-means
baseCommand: ["echo"]
stdout: output.json
inputs:
  - type:
      type: enum
      symbols:
        - monthly_averaged_reanalysis
        - monthly_averaged_reanalysis_by_hour_of_day
    id: product_type
    inputBinding:
      position: 1
  - type:
      type: enum
      symbols:
        - 10m_u_component_of_wind
        - 10m_v_component_of_wind
        - 2m_dewpoint_temperature
        - 2m_temperature
        - evaporation_from_bare_soil
        - evaporation_from_open_water_surfaces_excluding_oceans
        - evaporation_from_the_top_of_canopy
        - evaporation_from_vegetation_transpiration
        - forecast_albedo
        - lake_bottom_temperature
        - lake_ice_depth
        - lake_ice_temperature
        - lake_mix_layer_depth
        - lake_mix_layer_temperature
        - lake_shape_factor
        - lake_total_layer_temperature
        - leaf_area_index_high_vegetation
        - leaf_area_index_low_vegetation
        - potential_evaporation
        - runoff
        - skin_reservoir_content
        - skin_temperature
        - snow_albedo
        - snow_cover
        - snow_density
        - snow_depth
        - snow_depth_water_equivalent
        - snow_evaporation
        - snowfall
        - snowmelt
        - soil_temperature_level_1
        - soil_temperature_level_2
        - soil_temperature_level_3
        - soil_temperature_level_4
        - sub_surface_runoff
        - surface_latent_heat_flux
        - surface_net_solar_radiation
        - surface_net_thermal_radiation
        - surface_pressure
        - surface_runoff
        - surface_sensible_heat_flux
        - surface_solar_radiation_downwards
        - surface_thermal_radiation_downwards
        - temperature_of_snow_layer
        - total_evaporation
        - total_precipitation
        - volumetric_soil_water_layer_1
        - volumetric_soil_water_layer_2
        - volumetric_soil_water_layer_3
        - volumetric_soil_water_layer_4
    id: variable
    inputBinding:
      position: 2
  - type:
      type: enum
      symbols:
        - "1950"
        - "1951"
        - "1952"
        - "1953"
        - "1954"
        - "1955"
        - "1956"
        - "1957"
        - "1958"
        - "1959"
        - "1960"
        - "1961"
        - "1962"
        - "1963"
        - "1964"
        - "1965"
        - "1966"
        - "1967"
        - "1968"
        - "1969"
        - "1970"
        - "1971"
        - "1972"
        - "1973"
        - "1974"
        - "1975"
        - "1976"
        - "1977"
        - "1978"
        - "1979"
        - "1980"
        - "1981"
        - "1982"
        - "1983"
        - "1984"
        - "1985"
        - "1986"
        - "1987"
        - "1988"
        - "1989"
        - "1990"
        - "1991"
        - "1992"
        - "1993"
        - "1994"
        - "1995"
        - "1996"
        - "1997"
        - "1998"
        - "1999"
        - "2000"
        - "2001"
        - "2002"
        - "2003"
        - "2004"
        - "2005"
        - "2006"
        - "2007"
        - "2008"
        - "2009"
        - "2010"
        - "2011"
        - "2012"
        - "2013"
        - "2014"
        - "2015"
        - "2016"
        - "2017"
        - "2018"
        - "2019"
        - "2020"
        - "2021"
        - "2022"
        - "2023"
        - "2024"
    inputBinding:
      position: 3
    id: year
  - type:
      type: enum
      symbols:
        - "01"
        - "02"
        - "03"
        - "04"
        - "05"
        - "06"
        - "07"
        - "08"
        - "09"
        - "10"
        - "11"
        - "12"
    id: month
    inputBinding:
      position: 4
  - type:
      type: enum
      symbols:
        - "#00:00"
        - "#01:00"
        - "#02:00"
        - "#03:00"
        - "#04:00"
        - "#05:00"
        - "#06:00"
        - "#07:00"
        - "#08:00"
        - "#09:00"
        - "#10:00"
        - "#11:00"
        - "#12:00"
        - "#13:00"
        - "#14:00"
        - "#15:00"
        - "#16:00"
        - "#17:00"
        - "#18:00"
        - "#19:00"
        - "#20:00"
        - "#21:00"
        - "#22:00"
        - "#23:00"
    inputBinding:
      position: 5
    id: time
  - type:
      type: array
      items: float
    inputBinding:
      position: 6
    id: area
  - type:
      type: enum
      symbols:
        - grib
        - netcdf
    inputBinding:
      position: 7
    default:
      - grib
    id: data_format
  - inputBinding:
      position: 8
    type:
      type: enum
      symbols:
        - zip
        - unarchived
    default:
      - unarchived
    id: download_format
outputs:
  - type: File
    format: iana:application/json
    outputBinding:
      glob: '*.json'
    id: asset
$namespaces:
  iana: https://www.iana.org/assignments/media-types/
  weaver: https://schemas.crim.ca/cwl/weaver#

A job:

product_type: "monthly_averaged_reanalysis_by_hour_of_day"
variable: "skin_temperature"
year: "2020"
month: "01"
time: "04:00"  # NOTICE: no '#' here
area: [46.8639623, 13.3652612, 45.4236367, 16.5153015]
data_format: "netcdf"
download_format: "unarchived"
cwltool test.cwl job.yml
Running:  [cwltool --disable-color test.cwl job.yml 2>&1 | tee test.log]
Log Path: [/tmp/test.log]
INFO /home/francis/dev/conda/envs/weaver/bin/cwltool 0.1.dev4610+ga492285
INFO Resolved 'test.cwl' to 'file:///tmp/test.cwl'
INFO test.cwl:1:1: Unknown hint https://schemas.crim.ca/cwl/weaver#OGCAPIRequirement
INFO [job test.cwl] /tmp/c7nfkq5t$ echo \
    monthly_averaged_reanalysis_by_hour_of_day \
    skin_temperature \
    2020 \
    01 \
    04:00 \
    46.8639623 \
    13.3652612 \
    45.4236367 \
    16.5153015 \
    netcdf \
    unarchived > /tmp/c7nfkq5t/output.json
INFO [job test.cwl] completed success
{
    "asset": {
        "location": "file:///tmp/output.json",
        "basename": "output.json",
        "class": "File",
        "checksum": "sha1$8710c9a649d5fc4a12ffb9360ef582eaa281225b",
        "size": 136,
        "format": "https://www.iana.org/assignments/media-types/application/json",
        "path": "/tmp/output.json"
    }
}INFO Final process status is success
cat output.json
monthly_averaged_reanalysis_by_hour_of_day skin_temperature 2020 01 04:00 46.8639623 13.3652612 45.4236367 16.5153015 netcdf unarchived

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants