Replies: 3 comments 1 reply
-
Broadly this makes a lot of sense.
A couple of things that occur to me... I may be zeroing in on one sentence a bit too much here, but..
Some thoughts on this. If you update the S3 bucket every time something gets updated in YNR, there's a few possible issues:
..although the big upside is that as long as everything works, YNR and the S3 bucket should be pretty much in sync at any given point in time. The other option would be to have some kind of background/scheduled job to sync the YNR DB to the S3 bucket. As long as you can pick up what's changed, that gets rid of a lot of those problems. The tradeoff is that there's always going to be some lag between the data being updated in YNR and the sync job updating the S3 bucket. You've also got to
The other thing worth flagging here is that at the moment devs.DC can call WDIV and WCIVF in parallel. Having to have the ballots back from WDIV before you can get the candidates means you have to do the calls in sequence. Getting stuff from a S3 bucket within AWS should be a fast call but worth flagging that there is a response time impact to consider here. |
Beta Was this translation helpful? Give feedback.
-
Thanks Chris, I did wonder if I could nerd snipe your brain in to this thread 😆 In order:
(And some related points) This is all sensible to think about, but something you might not be aware of is some of the existing changes to YNR/WCIVF. Since ~Sept 21 we moved away from WCIVF doing a large import of the whole database every night to purely a delta based import. This is done by exposing an This means that we're already in a position where we can be sure that "give me all the things updated since X" is going to be true. That in turn means that we can remove the need to write to S3 on every DB write, and just have a job mop up the writes every We could also supplement this with some nightly job that just writes everything out, or at least writes current ballots. There are currently 33,442 ballots in YNR. Not trivial to write to S3, but also not something that's going to take long inside AWS. Also, don't forget that WCIVF will need more detailed person data than will be in YNR. If you think about the static JSON on S3 only containing the person IDs (alongside other ballot info), then WCIVF would need to do a I'm not sure, but I think all of that covers all your points above? Using a Lambda on the existing API (something I only thought of when writing this) would mean the maintenance of the queue would be almost nothing, too. As for performance: yeah, I did talk about that above but I don't think it's going to be a large problem for us. There are also options for optimization, for example by sticking a CloudFront between S3 and the devs.DC API, we could get the benefit of connection pooling and edge caching, rather than creating a new S3 client for each boot of the Lambda instance (the cold starts is where we'd slow things down most, I think). We could just use S3's web hosting too, and even some in memory caching in Lambda for more speed still. Either way, I think the slight slowdown will be ok for most of our users at worst, or unnoticeable at best. |
Beta Was this translation helpful? Give feedback.
-
This is done now 🎉 |
Beta Was this translation helpful? Give feedback.
-
Right, but hear me out...
This is roughly what we do at the moment (address pickers left out for the moment):
This requires both WCIVF and WDIV to be up and scaling well. This is working ok for the most part.
However, in recent years the biggest use case for the data from WCIVF is just the simple ballot data that we get from YNR. That is, a ballot containing:
We have added some extra data like hustings and leaflets, but, by volume of requests, we don't use them a lot.
What I'm suggesting is that we can use the WDIV EE lookup to return the list of ballots directly.
YNR, as the canonical source for ballot data, can write the JSON that we need to an S3 bucket when they are updated. Then, the devs.DC API can simply get the basic data from S3 directly, removing WCIVF from the lookup. We can have calls to
get_hustings_for_ballot
etc that can either come from WCIVF, or come from other S3 buckets. The point is, at this stage the key is the ballot paper ID, so it's fairly easy to cache the data we want.Why though?
This model has a few advantages.
Performance and resilience
At the moment if either WCIVF or WDIV is down, the API is down. We can mitigate this by failing the requests gracefully, but there's no escaping the fact that we can't serve candidate data if WCIVF isn't around.
Getting the same, mostly static, data from S3 would mitigate this a lot.
We wont be able to perform both queries in parallel like we do at the moment, but the "candidates for ballots" query would be as fast as getting the content from S3, so it's hardly going to slow things down.
Address picker on WCIVF
The suggested change actually has huge implications for WCIVF, It means that WCIVF itself can become a client of the API, and use the API to perform postcode (and address) lookups.
It can still store person profile data locally, and just get the list of ballots it needs to show from the API response. This is a super quick way to get address pickers on WCIVF without having to worry about AddressBase.
Beta Was this translation helpful? Give feedback.
All reactions