-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition when resuming an aborted run #5600
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -803,16 +803,18 @@ class TaskProcessor { | |
while( true ) { | ||
hash = HashBuilder.defaultHasher().putBytes(hash.asBytes()).putInt(tries).hash() | ||
|
||
Path resumeDir = null | ||
Path workDir = null | ||
boolean exists = false | ||
try { | ||
final entry = session.cache.getTaskEntry(hash, this) | ||
resumeDir = entry ? FileHelper.asPath(entry.trace.getWorkDir()) : null | ||
if( resumeDir ) | ||
exists = resumeDir.exists() | ||
workDir = entry | ||
? FileHelper.asPath(entry.trace.getWorkDir()) | ||
: task.getWorkDirFor(hash) | ||
if( workDir ) | ||
exists = workDir.exists() | ||
|
||
log.trace "[${safeTaskName(task)}] Cacheable folder=${resumeDir?.toUriString()} -- exists=$exists; try=$tries; shouldTryCache=$shouldTryCache; entry=$entry" | ||
final cached = shouldTryCache && exists && entry.trace.isCompleted() && checkCachedOutput(task.clone(), resumeDir, hash, entry) | ||
log.trace "[${safeTaskName(task)}] Cacheable folder=${workDir?.toUriString()} -- exists=$exists; try=$tries; shouldTryCache=$shouldTryCache; entry=$entry" | ||
final cached = shouldTryCache && exists && entry && entry.trace.isCompleted() && checkCachedOutput(task.clone(), workDir, hash, entry) | ||
if( cached ) | ||
break | ||
} | ||
|
@@ -826,11 +828,8 @@ class TaskProcessor { | |
} | ||
|
||
final lock = lockManager.acquire(hash) | ||
final workDir = task.getWorkDirFor(hash) | ||
try { | ||
if( resumeDir != workDir ) | ||
exists = workDir.exists() | ||
if( !exists && !workDir.mkdirs() ) | ||
if( !workDir.mkdirs() ) | ||
Comment on lines
830
to
+832
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been wondering about this lock over the task directory. I think the purpose is to prevent two tasks from using the same directory. But in that case maybe it should be over the previous try-catch block? It should prevent two tasks from checking the same directory at the same time, because that is how a task determines whether to use the directory or try a different one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may. however what I'm thinking that to solve this issue it should be assumed a new task run should always use a newly created directory. Not sure this logic satisfy it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. currently this lock is useless, because if two tasks request the same directory, the lock will serialize the |
||
throw new IOException("Unable to create directory=$workDir -- check file system permissions") | ||
} | ||
finally { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of that piece of code the lesser the changes the better, both to make it simple to review and to keep history readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. This is about as simple as it gets while keeping the intent clear.
and further down:
4. if the outputs are cached then use them
5. otherwise, if the work dir exists then use a new work dir
6. otherwise, create the work dir and use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, getting better, but why adding
entry &&
in thecached
condition? If the intent is to applied the same logic when theentry
is missing should not the condition remain the same?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous logic, it wouldn't check for cached outputs if the cache entry was missing. That is also how it behaves here. The
&& entry
is required to prevent a null reference exception onentry.trace.isCompleted()