fix: fetch snapshotter proxy object without holding cache lock #685

austinvazquez · 2022-06-23T20:54:40Z

Signed-off-by: Austin Vazquez [email protected]

Issue #, if available:
None

Description of changes:
fetch operation should not occur while the cache lock is held.

This unblocks snapshot requests whom have dialed their microVM to continue in the edge case where the lock has been acquired by a slow dialer.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Signed-off-by: Austin Vazquez <[email protected]>

ginglis13 · 2022-06-23T21:30:41Z

snapshotter/demux/cache/snapshotter_cache.go

 		snapshotter, ok = cache.snapshotters[key]
+		cache.mutex.Unlock()


we use a reader lock above. Should we be doing that here too?

Oof that adds a new level of complexity if we need to account for a second double check lock.

I'm confused. If !ok, wouldn't cache.snapshotters[key] be always nil?

@kzys, unless another thread populates the cache after we have a cache miss but before we acquire the writers lock. The cache entry needs to be .Close() before it is garbage collected to cleanup system resources for metrics proxy.

But @ginglis13 is correct this solution won't work for edge cases because we can leak now. So it requires fetch after reader's lock cache miss but before writer's lock acquisition.

How about moving cache.mutex.RUnlock() in this if block? The snapshotter doesn't do much time consuming operations between RUnlock and Lock.

That may result in deadlock. We'd block on Lock without releasing RUnlock. I don't believe RWMutex has brains to know assign writer's lock even if only one reader's lock is acquired by the same thread.

Kern-- · 2022-06-24T04:46:39Z

I think this deserves an issue to discuss options. We need to serialize dictionary writes, but you're right that we don't need to block all remote snapshotters while we dial just any snapshotter. We probably need something smarter than a single cache mutex.

austinvazquez · 2022-06-24T16:49:06Z

@kzys @Kern-- @ginglis13 I have created #687 to discuss options. Closing this PR for now and will re-opened based on our conversations there.

fix: fetch snapshotter proxy object without holding cache lock

6146162

Signed-off-by: Austin Vazquez <[email protected]>

austinvazquez marked this pull request as ready for review June 23, 2022 21:27

austinvazquez requested a review from a team as a code owner June 23, 2022 21:27

ginglis13 reviewed Jun 23, 2022

View reviewed changes

austinvazquez closed this Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fetch snapshotter proxy object without holding cache lock #685

fix: fetch snapshotter proxy object without holding cache lock #685

austinvazquez commented Jun 23, 2022 •

edited

Loading

ginglis13 Jun 23, 2022

austinvazquez Jun 23, 2022

kzys Jun 23, 2022

austinvazquez Jun 24, 2022

austinvazquez Jun 24, 2022

kzys Jun 24, 2022 •

edited

Loading

austinvazquez Jun 24, 2022

Kern-- commented Jun 24, 2022

austinvazquez commented Jun 24, 2022

		snapshotter, ok = cache.snapshotters[key]
		cache.mutex.Unlock()

fix: fetch snapshotter proxy object without holding cache lock #685

fix: fetch snapshotter proxy object without holding cache lock #685

Conversation

austinvazquez commented Jun 23, 2022 • edited Loading

ginglis13 Jun 23, 2022

Choose a reason for hiding this comment

austinvazquez Jun 23, 2022

Choose a reason for hiding this comment

kzys Jun 23, 2022

Choose a reason for hiding this comment

austinvazquez Jun 24, 2022

Choose a reason for hiding this comment

austinvazquez Jun 24, 2022

Choose a reason for hiding this comment

kzys Jun 24, 2022 • edited Loading

Choose a reason for hiding this comment

austinvazquez Jun 24, 2022

Choose a reason for hiding this comment

Kern-- commented Jun 24, 2022

austinvazquez commented Jun 24, 2022

austinvazquez commented Jun 23, 2022 •

edited

Loading

kzys Jun 24, 2022 •

edited

Loading