Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote snapshotter locks cache during microVM vsock dial #687

Open
austinvazquez opened this issue Jun 24, 2022 · 1 comment
Open

Remote snapshotter locks cache during microVM vsock dial #687

austinvazquez opened this issue Jun 24, 2022 · 1 comment
Labels

Comments

@austinvazquez
Copy link
Contributor

Context:

The demux snapshotter utilizes a snapshotter caching mechanism for funneling requests to the appropriate remote snapshotter.

This cache enables two things. One, we use it for performance reasons. Creating the proxy object can be an expensive operation. Two, the cache backs service discovery for metrics proxy.

Snapshotter requests can occur in parallel. So we need to protect memory. With the existing implementation, perform the following operations:

  1. Acquire reader's lock.
  2. Fetch snapshotter from cache.
  3. Release reader's lock.
  4. If cache hit, done.
  5. If cache miss, acquire writer's lock.
  6. Fetch snapshotter from cache.
  7. if cache hit, jump to 10.
  8. If cache miss, continue.
  9. Create cache entry using fetch function.
  10. Release writer's lock.

We utilize the double check lock to ensure no system resources are leaked if two threads populate the cache entry concurrently.
e.g.
Thread A - acquire reader's lock, cache miss, release reader's lock, and context switched.
Thread B - acquire reader's lock, cache miss, release reader's lock, and context switched.
Note: at this point both threads will have had a cache miss and are on course to populate the cache.
Thread A - acquire writer's lock, populate cache entry, release writer's lock.
Thread B - acquire writer's lock, populate cache entry, release writer's lock.
Note: at this point the cache entry from Thread A is leaked. While garbage collection will resolve the object itself, these entries are used to manage system resources which enable metrics proxy. In this case, a system port where the metrics proxy HTTP server is running.

Challenge:

The issue is the writer's lock is held during cache entry fetch which we have observed can be an expensive operation on some systems. The ideal solution would be to release the lock after a writer's lock cache miss; however, we must be cognizant of the above scenario and avoid leaking resources.

@austinvazquez
Copy link
Contributor Author

If we are to stick to the current mechanisms with a single RWMutex and map, then the solution may look like this.

// Get fetches and caches the snapshotter for a given key.
func (cache *SnapshotterCache) Get(ctx context.Context, key string, fetch SnapshotterProvider) (*proxy.RemoteSnapshotter, error) {
	cache.mutex.RLock()
	snapshotter, ok := cache.snapshotters[key]
	cache.mutex.RUnlock()

	if !ok {
		newSnapshotter, err := fetch(ctx, key)
		if err != nil {
			return nil, err
		}
		cache.mutex.Lock()
		snapshotter, ok = cache.snapshotters[key]
		defer cache.mutex.Unlock()

		if !ok {
			cache.snapshotters[key] = newSnapshotter
			snapshotter = newSnapshotter
		} else {
			newSnapshotter.Close()
		}
	}
	return snapshotter, nil
}

Although I am the author and even I admit it looks somewhat ugly. Could potentially be improved by breaking out the separate pieces of functionality into named functions. I also have a cache refactor out for review which reworks having the fetch function as a required instance variable which will eliminate the need to pass it through.

// Get fetches and caches the snapshotter for a given key.
func (cache *SnapshotterCache) Get(ctx context.Context, key string, fetch SnapshotterProvider) (*proxy.RemoteSnapshotter, error) {
	cache.mutex.RLock()
	snapshotter, ok := cache.snapshotters[key]
	cache.mutex.RUnlock()

	if !ok {
		var err error
		if snapshotter, err = cache.pullThrough(ctx, key, fetch); err != nil {
			return nil, err
		}
	}

	return snapshotter, nil
}

func (cache *SnapshotterCache) pullThrough(ctx context.Context, key string, pull SnapshotterProvider) (*proxy.RemoteSnapshotter, error) {
	snapshotter, err := pull(ctx, key)
	if err != nil {
		return nil, err
	}
	cache.mutex.Lock()
	defer cache.mutex.Unlock()

	if s, ok := cache.snapshotters[key]; ok {
		// Entry pulled through by another thread. Cleanup resouces allocated.
		snapshotter.Close()
		return s, nil
	}
	cache.snapshotters[key] = snapshotter
	return snapshotter, nil
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant