Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent commit IsolatePath type env in root_state, system memory leak #159

Closed
lizhihongTest opened this issue Apr 3, 2023 · 3 comments
Assignees
Labels
bug Something isn't working CYFS Stack This is CYFS Stack cyfs-base Functions and components in cyfs-base project Performance about performance issues

Comments

@lizhihongTest
Copy link
Collaborator

Describe the bug
Concurrent commit IsolatePath type env in root_state, system memory leak

To Reproduce
Each op_env inserts 1000 key values, and 100 concurrent op_env.commit() operations, the system memory overflows and is not released after the commit success

                       let create_result = await stack.root_state_stub().create_isolate_path_op_env_with_access({
                            access :cyfs.AccessPermissions.Full,
                            path : env_path
                        });
                        let op_env =  create_result.unwrap();
                        let create_info =await op_env.create_new(cyfs.ObjectMapSimpleContentType.Map)
                        assert.ok(!create_info.err);
                        let object_list = [];
                        for (let j = 0; j < ket_data_number; j++) {
                            let id = `message-${RandomGenerator.string(20)}-${Date.now()}`;
                            let request_json = JSON.stringify({
                                message: RandomGenerator.string(20)
                            });
                            let begin = Date.now();
                            let save_object = HandlerRequestObject.create(stack.local_device_id().object_id, HandlerType.PutObject, id,request_json, new Uint8Array(0));
                            let create_time = Date.now() - begin;
                            let result = await stack.non_service().put_object({
                                common: {
                                    flags: 1,
                                    level: cyfs.NONAPILevel.NOC
                                },
                                object: cyfs.NONObjectInfo.new_from_object_raw(save_object.to_vec().unwrap()).unwrap(),
                            })
                            let put_time = Date.now() - begin - create_time;
                            assert.ok(!result.err)
                            assert.ok(!result.err)
                            let object_id = save_object.calculate_id().to_base_58()
                            logger.info(`task ${i} put object success ${object_id} ,create_time = ${create_time} put_time = ${put_time}`)
                            let object_path =  `${env_path}/${object_id}`;
                            let insert_root =await op_env.insert_with_key(object_path,
                                object_id,
                                save_object.calculate_id(),)
                            assert.ok(!insert_root.err);
                            object_list.push({
                                object_path,
                                key : object_id,
                                object_id,
                                id,
                                request_json,
                                create_time,
                                put_time
                            })
                        }
                        let commit =await op_env.commit();

Expected behavior
The overall size of 100,000 key values is only about 40MB, OOD consumes about 69%*4GB=2.8GB of memory, and there should be memory overflow in the op_env.commit operation of CYFS Stack

System information
Linux OOD
4GB memory

@lizhihongTest lizhihongTest added the bug Something isn't working label Apr 3, 2023
@lurenpluto lurenpluto added CYFS Stack This is CYFS Stack cyfs-base Functions and components in cyfs-base project labels Apr 3, 2023
@lurenpluto lurenpluto self-assigned this Apr 3, 2023
@lurenpluto lurenpluto moved this to 🐞Discovered Bugs in CYFS-Stack & Services Apr 3, 2023
@lurenpluto lurenpluto moved this from 🐞Discovered Bugs to 💬To Discuss in CYFS-Stack & Services Apr 8, 2023
@lurenpluto
Copy link
Member

According to your method, I've written a more straightforward test case:

  • For each op_env, call insert_with_key to insert 1000 pairs of (key, value), where key = object_id.to_string(), and finally commit.
  • The above test case runs concurrently 100 times (concurrency is crucial).

However, there's a difference: I removed an interfering factor from your test case:

  • No longer creating objects and calling noc.put_object, because in the case of constructing an object-map based solely on op-env, this step is not necessary. In other words, the corresponding object in noc can be non-existent, so in my test case, I directly constructed a unique object_id for each insert_with_key operation.

The test case is in the cyfs-stack-test project, compiled and run under the following conditions:

  • Compile the release version.
  • Set the log level to warn.

During the running process, the memory usage peaks at around 6GB, and when commit complete, the memory usage drops to 160MB. From the perspective of op-env, the memory usage is analyzed as follows:

First, let's look at the following code snippet and what it does internally:

let path_env = root_state.create_isolate_path_op_env().await.unwrap();
path_env.create_new(ObjectMapSimpleContentType::Map).await.unwrap();
for j in 0..1000 {
let object_id = random_object(i, j);
let path = format!("/{}", object_id.to_string());
path_env.insert_with_path(&path, &object_id).await.unwrap();
}
warn!("test isolate path will commit, index={}", i);
path_env.commit().await.unwrap();
warn!("test isolate path commit complete, index={}", i);

Two key points of object_map design

Let's start with two key points of object_map design and see how many intermediate objects and result objects are generated during the operation process, as well as the corresponding memory usage:

  • The object_map itself is immutable. Each modification will use the copy-modify method to generate a new object-map, but the old object-map still exists and is saved as a temporary object in the cache of op-env.

Therefore, after the 1000 insert_with_path operations above, before the commit, a recycling of the internal objects will be performed. From the analysis of the logs, there are about 1200 temporary objects on average.

gc for target single=false, target=95RvaS5AZbf6iySX7R73JqpW64XxsmHjXpCeYijuK8yx, 2001 -> 787
  • When the desc size of the object_map exceeds the 64KB limit, it will switch to hub mode, generating a large number of sub object_map objects and increasing memory usage.

In the test case, the size of each (key, value) pair is about 78 bytes.

let object_id = ObjectId::from_str("95RvaS5Wy2UZC4Pzy7A6PZTs47bsWLCPtZe834oVvbQ4").unwrap();
let s = object_id.to_string();
let len = object_id.raw_measure(&None).unwrap() + s.raw_measure(&None).unwrap();
println!("len={}", len);

So it can be estimated that (65536/78) ≈ 840, so these 1000 operations will eventually cause the root object-map to change modes, switching from simple mode to hub mode, generating a large number of sub object maps, and increasing memory usage.

Some results and data analysis obtained by adding some logs

From the analysis in point 1, we can deduce the distribution of the 1,200 temporary objects mentioned as follows:

  • 1,000 root object copies: For each insert_with_key, the root object changes once, and its size gradually increases. When it reaches 840, the 64KB desc limit is triggered, causing a mode switch to the hub mode. After switching to the hub mode, the root object size remains mostly constant. Each time a new value is inserted, a new child object is added or an existing child object is updated (based on the hash of the key % 16). The main size comes from the temporary objects generated by the first 840 operations, with an average size of 64KB * 840 / 2 = 26MB. The remaining 160 root objects in hub mode have an individual size of about 25KB, and the total memory usage is about 4MB.

  • 200 child object copies: Generally, different keys are hashed to the same slot, so the same sub-object map also needs to perform copy-modify. However, the memory usage of these 200 objects should not be significant. On average, each object occupies (19 + 76) bytes, with a total usage of 18KB.

Additionally, the logs show that after the 1,000 operations, there are on average around 770 sub object_map child objects created. These child objects need to be saved to the noc during commit. The size of these 770 sub object_maps is not very large, occupying about 71KB.

Thus, the total amount of child objects created by a single op-env is about 30MB, with the majority being the size of the temporary objects created by copying. This size is only based on the raw_codec encoding, but the actual memory usage of the Object in memory is much larger. From the test case analysis, it is determined that the actual memory usage of the 2,000 objects generated after 1,000 operations in a single op-env is about 60MB.

Some conclusions

Based on the current data analysis, the cache of op-env occupies around 60MB. When 100 op-envs are running concurrently, the total memory usage is about 6GB. Several points need optimization:

  • The copy-on-write strategy during op-env operations should be considered for optimization. Since a single op-env operation is serialized due to the lock and has its memory cache (all object-maps to be modified will be loaded into the corresponding memory cache of the op-env), there may be no need to use the copy-on-write strategy for these object-maps already loaded into the exclusive cache. Direct in-place modification may be possible.

  • The result object generated by op-env commits to the noc very slowly, which also needs to be considered for optimization. This may require further performance analysis of noc.put_object, as this operation involves a lot of work and some global locks, which may have room for optimization. However, one planned improvement is to support batch operations, which will significantly improve the performance of concurrently putting a large number of objects.

@lurenpluto
Copy link
Member

Regarding the optimization of this case, I built two features about optimization as follows

Welcome to pay attention to relevant progress!

@lurenpluto lurenpluto added the Performance about performance issues label Apr 11, 2023
@lurenpluto
Copy link
Member

Describe the bug Concurrent commit IsolatePath type env in root_state, system memory leak

To Reproduce Each op_env inserts 1000 key values, and 100 concurrent op_env.commit() operations, the system memory overflows and is not released after the commit success

Expected behavior The overall size of 100,000 key values is only about 40MB, OOD consumes about 69%*4GB=2.8GB of memory, and there should be memory overflow in the op_env.commit operation of CYFS Stack

System information Linux OOD 4GB memory

In addition, during the above test case process, I did not find any memory leak phenomenon. After the commit operation of op-env is completed, the memory usage will decrease to the previous level. However, the performance of the commit operation will be poor under the 100 concurrent situations, which takes a long time.

From the test, memory usage can be divided into three stages:

pub async fn update(&self) -> BuckyResult<ObjectId> {
let _write_lock = self.write_lock.lock().await;
// First gc temporary objects that are generated
let root = self.path()?.root();
if let Err(e) = self.cache.gc(false, &root).await {
error!("path env's cache gc error! root={}, {}", root, e);
}
// Save all result objects to noc
self.cache.commit().await?;
Ok(root)
}

  • After the completion of the 1000 operations in op-env and before commit, memory usage reaches its maximum.
  • During the internal garbage collection (GC) of op-env's commit, all temporary objects are reclaimed, and memory usage is significantly reduced.
  • After the commit operation of op-env is completed, the memory usage returns to the level before running the test case (this process takes a longer time).

So, in your test case, the "memory leak" may be due to op-env not having completed the commit operation, or it could be a side effect caused by other factors introduced in the test case.

@lurenpluto lurenpluto moved this from 💬To Discuss to ✅Done in CYFS-Stack & Services Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CYFS Stack This is CYFS Stack cyfs-base Functions and components in cyfs-base project Performance about performance issues
Projects
Status: Done
Development

No branches or pull requests

2 participants