-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document remote file staging #5523
Document remote file staging #5523
Conversation
Signed-off-by: Christopher Hakkaart <[email protected]>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@bentsherman - can you point me in the direction of the code for hash generation and caching behavior so I can finish this PR? |
Those sections will be tricky to write. I will try to finish them this week |
All good! Thanks! |
Signed-off-by: Ben Sherman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopher-hakkaart it's your PR but let me know if the changes look good
Signed-off-by: Ben Sherman <[email protected]>
@pditommaso if you could look over this for technical correctness. We are documenting the remote file staging. Also a nice place to highlight the need for Fusion |
Signed-off-by: Christopher Hakkaart <[email protected]>
Thanks @bentsherman |
Co-authored-by: Ben Sherman <[email protected]> Signed-off-by: Chris Hakkaart <[email protected]>
Co-authored-by: Ben Sherman <[email protected]> Signed-off-by: Chris Hakkaart <[email protected]>
Co-authored-by: Ben Sherman <[email protected]> Signed-off-by: Chris Hakkaart <[email protected]>
Co-authored-by: Ben Sherman <[email protected]> Signed-off-by: Chris Hakkaart <[email protected]>
docs/working-with-files.md
Outdated
When a remote file is passed as an input to a process, Nextflow stages the file into the work directory using an appropriate Java SDK. | ||
|
||
Remote files are staged in a subdirectory of the work directory with the form `stage-<session-id>/<hash>/<filename>`, where `<hash>` is determined by the remote file path. If multiple tasks request the same remote file, the file will be downloaded once and reused by each task. These files can be reused by resumed runs with the same session ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to better this kind of file transfer happens every time the origin or the destination of the file system is different from the workflow work directory.
For example, the input file is in the local computer or it's a http remote file AND the pipeline uses S3 bucket as work dir, then nextflow needs to copy into S3. Same logic when it needs to copy the output files.
For the same reason it's important to advice to keep the input and outputs in the same storage system e.g. S3 or shared file system.
Minor: it would be preferable use "copy" or download input files and upload output files instead instead of "stage" that's too slang tech term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still prefer to call it "remote file staging" in summary because it is concise. But I will explain it as copying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem with staging definition, however the part how improve the definition of "remote" file. A local file it's consider remote if the work dir is, for example, S3.
Signed-off-by: Ben Sherman <[email protected]>
To address #5493
@bentsherman - I'll need some help regarding:
I've commented on where I would add these.