-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overwrite published outputs only if they are stale #4729
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Signed-off-by: Ben Sherman <[email protected]>
@Override | ||
String getETag() { | ||
if( metadata == null ) { | ||
final googleOpts = GoogleOpts.create(Global.session as Session) | ||
final client = StorageOptions.newBuilder() | ||
.setProjectId(googleOpts.projectId) | ||
.build() | ||
.getService() | ||
metadata = client.get(target.bucket(), target.toString()) | ||
} | ||
return metadata?.getMd5ToHexString() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a custom NIO filesystem for Google cloud storage which just delegates to the existing one, but allows us to add custom behavior like this ETag support.
It turns out that the ETag returned by the Java SDK is a CRC32 instead of MD5. The MD5 is also available through this storage API, but I haven't gotten it to work yet. The metadata is always null, even though I provided the projectId and I have the credentials file. Need to investigate further when I have more time
Some notes from discussion today (related to workflow output DSL):
|
This PR is mostly covered by the workflow output definition #4784 . The only remaining piece is the I propose that we merge that piece and leave out the Google NIO filesystem for now since I never quite got it to work. I can finish it in a separate PR. Then we can move this PR forward for S3 and Azure. |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Close #3372
This PR builds on #3933 , which ensures that files in a directory are explicitly published. Given this behavior, we can then compare the source and target checksums to decide whether to publish a file.
The checksum code is ripped from #3802 , as it is also needed for task provenance and the resumability for automatic cleanup (#3849). It tries to use a pre-computed checksum (such as ETag for cloud storage paths) where possible to avoid scanning the file contents.
The only qualm I have right now is that the checksum code doesn't use the ETag for Google cloud storage paths. This is because we use the built-in file system provider from the Google SDK rather than our own. Maybe we can just extend this FS provider and register it over the existing one, but I'm not sure yet how to do it.