Skip to content

Commit

Permalink
Prevent potential busy loop in scheduler from jobs > nodes (#3060)
Browse files Browse the repository at this point in the history
* Prevent potential busy loop in scheduler

This change remediates a potential busy-loop in the scheduler which
results from either next_adds_job allowing the job count to exceed the node count
or a retry after configuration fetch failure causing the same invalid state.

* Update changelog

* Summarize / simplify the job class increment function
  • Loading branch information
gs-kamnas authored Feb 12, 2024
1 parent 366ac34 commit 7bfc6ca
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
- Fixed login into Fortigate when post-login-baned ist enabled. Fixes #2021 (@chrisr0880, @sahdan, @dangoscomb and @robertcheramy)
- Fixed pre_logout for BDCOM switches
- Fix 'wpa passphrase' hashed secret for SonicOS devices with built-in wireless #3036 (@lazynooblet)
- Fix potential busy wait when retries and/or next_adds_job is enabled (@gs-kamnas)

## [0.29.1 - 2023-04-24]

Expand Down
12 changes: 11 additions & 1 deletion lib/oxidized/jobs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,24 @@ def new_count
@want = @max if @want > @max
end

def increment
# Increments the job count if safe to do so, which means:
# a) less threads running than the total amount of nodes
# b) we want less than the max specified number of threads

want = [(@want + 1), @nodes.size, @max].min
end

def work
# if a) we want less or same amount of threads as we now running
# and b) we want less threads running than the total amount of nodes
# and c) there is more than MAX_INTER_JOB_GAP since last one was started
# then we want one more thread (rationale is to fix hanging thread causing HOLB)
return unless @want <= size && @want < @nodes.size

@want += 1 if (Time.now.utc - @last) > MAX_INTER_JOB_GAP
return unless @want <= size

increment if (Time.now.utc - @last) > MAX_INTER_JOB_GAP
end
end
end
2 changes: 1 addition & 1 deletion lib/oxidized/nodes.rb
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def next(node, opt = {})
# set last job to nil so that the node is picked for immediate update
n.last = nil
put n
jobs.want += 1 if Oxidized.config.next_adds_job?
jobs.increment if Oxidized.config.next_adds_job?
end
end
alias top next
Expand Down

0 comments on commit 7bfc6ca

Please sign in to comment.