Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate slowness while running BSM-Search example #140

Open
roksys opened this issue Jan 30, 2020 · 2 comments
Open

Investigate slowness while running BSM-Search example #140

roksys opened this issue Jan 30, 2020 · 2 comments

Comments

@roksys
Copy link
Contributor

roksys commented Jan 30, 2020

When trying to scale BSM-Search example, it could take up to 3 mins for the workflow engine to start submitting first jobs.

In order to increase number of jobs the following modification is needed.

$ git diff
diff --git a/workflow/databkgmc.yml b/workflow/databkgmc.yml
index b41427e..e41ae7e 100644
--- a/workflow/databkgmc.yml
+++ b/workflow/databkgmc.yml
@@ -16,7 +16,7 @@ stages:
       parameters:
         mcname: [mc1,mc2]
         mcweight: [0.01875,0.0125]  # [Ndata / Ngen * 0.2 * 0.15,  Ndata / Ngen * 0.2 * 0.1] = [10/16*0.03, 1/16 * 0.02]
-        nevents:  [40000,40000,40000,40000]  #160k events / mc sample
+        nevents:   [5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,5000,50005000,5000,5000,5000,50005000,5000,5000,5000,5000,5000,5000]
       workflow: {$ref: workflow/wflow_all_mc.yml}
   - name: data
     scheduler:
@lukasheinrich
Copy link
Member

yadage has two modes:

  • you can start submitting jobs as soon as the jobs are added to the graph
  • you can build the graph as much as possible and only then submit jobs

the latter is the default and it helps with debugging and doing "dry-runs", but the behavior can be controlled using engine options. We could expose those to reana.yml

@roksys
Copy link
Contributor Author

roksys commented Jan 30, 2020

Hi @lukasheinrich,

How could I set the first option? Is it one of yadage-run options?

$ yadage-run --help
Usage: yadage-run [OPTIONS] DATAARG [WORKFLOW] [INITFILES]...

Options:
  -b, --backend TEXT              packtivity backend string
  -c, --cache TEXT
  -d, --dataopt TEXT              options for the workflow data state
  -e, --schemadir TEXT            schema directory for workflow validation
  -f, --from-file FILENAME        read entire configuration from file, no
                                  other flags settings are read.
  -g, --strategy TEXT             set execution stragegy
  -i, --loginterval INTEGER       adage tracking interval in seconds
  -k, --backendopt TEXT           options for the workflow data state
  -l, --modelopt TEXT             options for the workflow state models
  -m, --metadir TEXT              directory to store workflow metadata
  -o, --ctrlopt TEXT              options for the workflow controller
  -p, --parameter TEXT            <parameter name>=<yaml string> input
                                  parameter specifcations
  -r, --controller TEXT           controller
  -s, --modelsetup TEXT           wflow state model
  -t, --toplevel TEXT             toplevel uri to be used to resolve workflow
                                  name and references from
  -u, --updateinterval FLOAT      adage graph inspection interval in seconds
  -v, --verbosity TEXT            logging verbosity
  --accept-metadir / --no-accept-metadir
  --plugins TEXT
  --validate / --no-validate      en-/disable workflow spec validation
  --visualize / --no-visualize    visualize workflow graph
  --help                          Show this message and exit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants