Look into Artificial Stranding #2

Mythra · 2019-01-13T02:33:22Z

Description

Strands are a really useful mechanic, mainly in the fact they can help ensure one type of job/one particular tenant doesn't stomp all over your job queue (by queuing up so many high priority jobs no one else can get work done).

Unfortunately, this requires manual human intervention. You have to manually setup sharding, and maintain it over time. From the DBA who has helped with most of the way we interact with the database:

DBA [14:25]
how hard would it be to make coworker pay attention to how long jobs of a given type take, and then self-throttle when it looks like things are saturated?
Eric Coan [14:27]
hmmm the newest OSS version pending adds in some more metrics that we have internally
saturating would be hard tho, cause you could have one job that gets thrown on three nodes each taking 1 hour each
but each node only sees "one job took an hour"
doesn't have enough info to correlate it always takes an hour by itself
DBA [14:28]
yeah, true.
one of my most common sources of alerts is bouncer pools being saturated, because some job has been enqueued for some tenant on that cluster, and that’s eaten up all the slots, and is piling up on itself.
i’m just wondering if there’s a way to make the job queue smarter about not slamming the db.
[...]
Eric Coan [14:31]
can we not strand those more?
DBA [14:31]
some kind of ramp-up or something
ideally yes. but apparently either we deliberately don’t, or we don’t realize the need to, as the case may be.
but check out the waiting clients spikes for c85 in #dba-alerts
that seems like it could be made not to be so hulk smash
Eric Coan [14:33]
Yeah, it'd be nice if the job queue could apply some artificial stranding if it noticed there was too much coming in

Now I don't think this is ultimately useful for every situation (and it may add more complexity than some users want). It's like a complexity trade-off, but in the fact we can help the DB more.

There's currently two ideas I'm playing around with:

Using an "exponential moving average" on the total job time, to create a "totalJobTimeMap" that shows how many should be running in relation to other jobs. Combine this with "nStrandMap", and take the lowest value from each map for each job.
Use a "bucket" algorithm, where each job gets a configurable number of jobs to work, and each time it works this number decrements, while each time it finishes it increments back up.

The text was updated successfully, but these errors were encountered:

Mythra · 2019-01-21T21:01:46Z

Another teammate of ours mentioned looking into how Golang does GC. It does a "debit system", where each time you borrow you accumulate more "debt". Then eventually you have to accumulate "credit" by doing something useful to the gc.

Perhaps this "usefulness" could be little time taken up on the db, perhaps just how often you work? I don't know, but it seems like a more fleshed out part of what I was going for with averages.

refs #2 this is a bare-bones, far from perfect credit api. one that can be used to "auto-n-strand" certain jobs.

Mythra added the enhancement New feature or request label Jan 13, 2019

Mythra pushed a commit that referenced this issue Jul 14, 2019

very experimental credit api

b259f3d

refs #2 this is a bare-bones, far from perfect credit api. one that can be used to "auto-n-strand" certain jobs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look into Artificial Stranding #2

Look into Artificial Stranding #2

Mythra commented Jan 13, 2019

Mythra commented Jan 21, 2019

Look into Artificial Stranding #2

Look into Artificial Stranding #2

Comments

Mythra commented Jan 13, 2019

Description

Mythra commented Jan 21, 2019