Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored restore to better use network resources #62

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

vitrvvivs
Copy link

Staggers write requests in order to reduce number of unprocessed items.
Combines unprocessed items into new batches (no more batches of only a few items).
Allows restoring from local file, because s3 likes to close long-running connections.

Implementation
It now has two completely separate loops:

  1. readline (created in _startDownload) that parses and pushes each line into an array (requestItems)
  2. _sendBatch (started in _checkTableReady), which pulls items from that array and sends them as batches.
    This separation allows _sendBatch to call itself after a certain amount of time has passed (every (1000 / concurrency) milliseconds). The previous implementation allowed a certain number of concurrent requests regardless of speed; on a fast network (a large EC2 instance), even 1 concurrent request was equivalent to 2500 writes per second.

Matt Geskey added 17 commits September 26, 2017 10:09
S3 has a chance of randomly closing the connection before the download
is finished. This makes restoring from large files impossible. This is a
hack, to download the file quickly, then do the much-slower restore.
most of the time was spent in node (CPU bound). Timing only how long the
request took failed to acount for overhead, and thus throttled down to
20% of the target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant