Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fetch snapshotter proxy object without holding cache lock #685

Closed
wants to merge 1 commit into from
Closed

fix: fetch snapshotter proxy object without holding cache lock #685

wants to merge 1 commit into from

Conversation

austinvazquez
Copy link
Contributor

@austinvazquez austinvazquez commented Jun 23, 2022

Signed-off-by: Austin Vazquez [email protected]

Issue #, if available:
None

Description of changes:
fetch operation should not occur while the cache lock is held.

This unblocks snapshot requests whom have dialed their microVM to continue in the edge case where the lock has been acquired by a slow dialer.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@austinvazquez austinvazquez marked this pull request as ready for review June 23, 2022 21:27
@austinvazquez austinvazquez requested a review from a team as a code owner June 23, 2022 21:27
snapshotter, ok = cache.snapshotters[key]
cache.mutex.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use a reader lock above. Should we be doing that here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof that adds a new level of complexity if we need to account for a second double check lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. If !ok, wouldn't cache.snapshotters[key] be always nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kzys, unless another thread populates the cache after we have a cache miss but before we acquire the writers lock. The cache entry needs to be .Close() before it is garbage collected to cleanup system resources for metrics proxy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But @ginglis13 is correct this solution won't work for edge cases because we can leak now. So it requires fetch after reader's lock cache miss but before writer's lock acquisition.

Copy link
Contributor

@kzys kzys Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving cache.mutex.RUnlock() in this if block? The snapshotter doesn't do much time consuming operations between RUnlock and Lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may result in deadlock. We'd block on Lock without releasing RUnlock. I don't believe RWMutex has brains to know assign writer's lock even if only one reader's lock is acquired by the same thread.

@Kern--
Copy link
Contributor

Kern-- commented Jun 24, 2022

I think this deserves an issue to discuss options. We need to serialize dictionary writes, but you're right that we don't need to block all remote snapshotters while we dial just any snapshotter. We probably need something smarter than a single cache mutex.

@austinvazquez
Copy link
Contributor Author

@kzys @Kern-- @ginglis13 I have created #687 to discuss options. Closing this PR for now and will re-opened based on our conversations there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants