You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have multiple containers running on a same node with overlaybd as its container snapshotter, which are doing lazy pulling for all rootfs contents. When running it on prod, we found the P95 latency has huge gaps with P50 (20s vs 10s). After checking some logs we saw an interesting coincident that
For those image pulling with unexpected latencies:
Oct 02 06:04:28 [Event] Start to pull image for container executor: image harbor-xxxxx
Oct 02 06:04:37 [Event] Finish pulling image for container executor: image harbor-xxxx
There is a container creation events hapenning inside containerd
Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.671653617Z" level=info msg="CreateContainer within sandbox \"e0d9308c3259dc01251575ad5c27d2efdbdaf00b7c267f06a7ab15ed6d827e23\""
Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.672341423Z" level=info msg="StartContainer for \"3a6b0dce5e9168993ccd0c3213929af87e4765304774773188e1830631e2ff39\""
Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.672417656Z" level=info msg="container start request for xxxx"
Oct 02 06:04:29 ip-10-1-162-245 containerd[387]: time="2023-10-02T06:04:29.837229175Z" level=info msg="StartContainer for \"3a6b0dce5e9168993ccd0c3213929af87e4765304774773188e1830631e2ff39\" returns successfully"
We are suspecting the container creating events (which contains some container rootfs construction process) is interfering with container image pulling and impact image lazy pull latency.
We are looking for some insights from upstream about what is the potential reason for such performance regression.
What did you expect to happen?
No response
How can we reproduce it?
Use overlaybd as snapshotter, overlap some container creation with container image download.
What is the version of your Overlaybd?
0.6.17
What is your OS environment?
ubuntu 20.04
Are you willing to submit PRs to fix it?
Yes, I am willing to fix it.
The text was updated successfully, but these errors were encountered:
@shuochen0311
What was the workload in container created at 06:04:29, did it load a large amount of data which affected image pulling?
were there any other logs between 06:04:29 and 06:04:37?
@liulanzheng thanks for responding. Let me see what else can I find from the log in that period of time.
A question on my side is if the container creation/start requires a lot of data pulling, Will it affect the performance for the rpull(metadata pulling) which is at the critical path before container starts?
@lihuiba how do I know if my container is downloading a lot of data? Meanwhile, I think the question is is it expected that data downloading will affect the rpull performance?
What happened in your environment?
We have multiple containers running on a same node with overlaybd as its container snapshotter, which are doing lazy pulling for all rootfs contents. When running it on prod, we found the P95 latency has huge gaps with P50 (20s vs 10s). After checking some logs we saw an interesting coincident that
For those image pulling with unexpected latencies:
There is a container creation events hapenning inside containerd
We are suspecting the container creating events (which contains some container rootfs construction process) is interfering with container image pulling and impact image lazy pull latency.
We are looking for some insights from upstream about what is the potential reason for such performance regression.
What did you expect to happen?
No response
How can we reproduce it?
Use overlaybd as snapshotter, overlap some container creation with container image download.
What is the version of your Overlaybd?
0.6.17
What is your OS environment?
ubuntu 20.04
Are you willing to submit PRs to fix it?
The text was updated successfully, but these errors were encountered: