You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use Ubuntu 18.04.3, python 3.6.9, yandexcloud 0.34.0, grpc 1.28.1.
Our application continuously starts and stops instances in YC, making no more than a few hundred API requests an hour (probably less). We ran into the problem that after running this way for some time (perhaps a couple of days) the application inevitably crashes with a stack trace like
Traceback (most recent call last):
File "./dispatcher.py", line 86, in runInstance
disks = ysdk.client(DiskServiceStub).List(ListDisksRequest(folder_id = CONF['folder_id'])).disks
File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_interceptor.py", line 221, in __call__
compression=compression)
File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_interceptor.py", line 257, in _with_call
return call.result(), call
File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_channel.py", line 333, in result
raise self
File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_interceptor.py", line 247, in continuation
compression=new_compression)
File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_channel.py", line 837, in with_call
return _end_unary_response_blocking(state, call, True, None)
File "/home/ubuntu/.local/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Getting metadata from plugin failed with error: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc975e55a58>: Failed to establish a new connection: [Errno 24] Too many open files',))"
debug_error_string = "{"created":"@1590184822.768147799","description":"Getting metadata from plugin failed with error: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc975e55a58>: Failed to establish a new connection: [Errno 24] Too many open files',))","file":"src/core/lib/security/credentials/plugin/plugin_credentials.cc","file_line":79,"grpc_status":14}"
>
Before the crash the grpc library also outputs error messages, e.g.:
E0522 22:00:22.710785667 14565 ev_epollex_linux.cc:1458] pollset_set_add_pollset: {"created":"@1590184822.710768750","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_eventfd.cc","file_line":38,"os_error":"Too many open files","syscall":"eventfd"}
E0522 22:00:26.276082489 14563 ev_epollex_linux.cc:1306] pollset_add_fd: {"created":"@1590184826.276050019","description":"pollset_transition_pollable_from_empty_to_fd","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184826.276048606","description":"get_fd_pollable","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184826.276041326","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_eventfd.cc","file_line":38,"os_error":"Too many open files","syscall":"eventfd"}]}]}
E0522 22:00:27.901743430 14560 ev_epollex_linux.cc:1458] pollset_set_add_pollset: {"created":"@1590184827.901723028","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/wakeup_fd_eventfd.cc","file_line":38,"os_error":"Too many open files","syscall":"eventfd"}
E0522 22:00:29.869932962 14563 ev_epollex_linux.cc:1306] pollset_add_fd: {"created":"@1590184829.869899748","description":"pollset_transition_pollable_from_empty_to_fd","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184829.869898650","description":"get_fd_pollable","file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":325,"referenced_errors":[{"created":"@1590184829.869897060","description":"Too many open files","errno":24,"file":"src/core/lib/iomgr/ev_epollex_linux.cc","file_line":568,"os_error":"Too many open files","syscall":"epoll_create1"}]}]}
E0522 22:00:33.867603041 27147 ev_epollex_linux.cc:1408] assertion failed: i != pss->pollset_count
This may be caused by a known problem in grpc. E.g. see grpc/grpc#15759 and related issues.
As a workaround, we tried setting nofile OS limit to a very high value. This results in the following behavior: over the course of several days (or weeks) average cpu load of the application grows (presumably caused by an ever-growing number of open files) until it hits 100% the app becomes completely unresponsive.
It should be noted that when using AWS EC2 SDK/cloud for instance management in an otherwise identical app under a very similar load, no issues of this kind occur. This is an indication that the problem is truly an issue in YC SDK.
The text was updated successfully, but these errors were encountered:
Hi! I cannot test this on the latest version right now. We are currently using an old version (0.60.0) and the issue is still there.
I am pretty sure this is caused by a gRPC issue, which does not appear do be fixed. See e.g. grpc/grpc#20418
I will get back to you if I can confirm this on an up to date version of yandexcloud.
Posting here as recommended by YC support.
We use Ubuntu 18.04.3, python 3.6.9, yandexcloud 0.34.0, grpc 1.28.1.
Our application continuously starts and stops instances in YC, making no more than a few hundred API requests an hour (probably less). We ran into the problem that after running this way for some time (perhaps a couple of days) the application inevitably crashes with a stack trace like
Before the crash the grpc library also outputs error messages, e.g.:
This may be caused by a known problem in grpc. E.g. see grpc/grpc#15759 and related issues.
As a workaround, we tried setting nofile OS limit to a very high value. This results in the following behavior: over the course of several days (or weeks) average cpu load of the application grows (presumably caused by an ever-growing number of open files) until it hits 100% the app becomes completely unresponsive.
It should be noted that when using AWS EC2 SDK/cloud for instance management in an otherwise identical app under a very similar load, no issues of this kind occur. This is an indication that the problem is truly an issue in YC SDK.
The text was updated successfully, but these errors were encountered: