Skip to content
This repository has been archived by the owner on Jul 22, 2021. It is now read-only.

Look into Artificial Stranding #2

Open
Mythra opened this issue Jan 13, 2019 · 1 comment
Open

Look into Artificial Stranding #2

Mythra opened this issue Jan 13, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@Mythra
Copy link
Owner

Mythra commented Jan 13, 2019

Description

Strands are a really useful mechanic, mainly in the fact they can help ensure one type of job/one particular tenant doesn't stomp all over your job queue (by queuing up so many high priority jobs no one else can get work done).

Unfortunately, this requires manual human intervention. You have to manually setup sharding, and maintain it over time. From the DBA who has helped with most of the way we interact with the database:

DBA [14:25]
how hard would it be to make coworker pay attention to how long jobs of a given type take, and then self-throttle when it looks like things are saturated?
Eric Coan [14:27]
hmmm the newest OSS version pending adds in some more metrics that we have internally
saturating would be hard tho, cause you could have one job that gets thrown on three nodes each taking 1 hour each
but each node only sees "one job took an hour"
doesn't have enough info to correlate it always takes an hour by itself
DBA [14:28]
yeah, true.
one of my most common sources of alerts is bouncer pools being saturated, because some job has been enqueued for some tenant on that cluster, and that’s eaten up all the slots, and is piling up on itself.
i’m just wondering if there’s a way to make the job queue smarter about not slamming the db.
[...]
Eric Coan [14:31]
can we not strand those more?
DBA [14:31]
some kind of ramp-up or something
ideally yes. but apparently either we deliberately don’t, or we don’t realize the need to, as the case may be.
but check out the waiting clients spikes for c85 in #dba-alerts
that seems like it could be made not to be so hulk smash
Eric Coan [14:33]
Yeah, it'd be nice if the job queue could apply some artificial stranding if it noticed there was too much coming in

Now I don't think this is ultimately useful for every situation (and it may add more complexity than some users want). It's like a complexity trade-off, but in the fact we can help the DB more.

There's currently two ideas I'm playing around with:

  • Using an "exponential moving average" on the total job time, to create a "totalJobTimeMap" that shows how many should be running in relation to other jobs. Combine this with "nStrandMap", and take the lowest value from each map for each job.
  • Use a "bucket" algorithm, where each job gets a configurable number of jobs to work, and each time it works this number decrements, while each time it finishes it increments back up.
@Mythra Mythra added the enhancement New feature or request label Jan 13, 2019
@Mythra
Copy link
Owner Author

Mythra commented Jan 21, 2019

Another teammate of ours mentioned looking into how Golang does GC. It does a "debit system", where each time you borrow you accumulate more "debt". Then eventually you have to accumulate "credit" by doing something useful to the gc.

Perhaps this "usefulness" could be little time taken up on the db, perhaps just how often you work? I don't know, but it seems like a more fleshed out part of what I was going for with averages.

Mythra pushed a commit that referenced this issue Jul 14, 2019
refs #2

this is a bare-bones, far from perfect credit api. one that can
be used to "auto-n-strand" certain jobs.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant