Skip to content

Commit

Permalink
Turn on gRPC keepalive
Browse files Browse the repository at this point in the history
Fixes hangs on gRPC connections that last longer than 10 minutes.
  • Loading branch information
jysohn23 authored and dlibenzi committed Oct 3, 2019
1 parent dd6f071 commit 3d52569
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 0 deletions.
17 changes: 17 additions & 0 deletions TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,23 @@ only be enabled for debugging.
expensive, so setting this flag might help. It should be verified by the user that truncating
to 32bit values is a valid operation according to the use of _PyTorch_ _Long_ values in it.

* ```TF_CPP_LOG_THREAD_ID```: If set to 1, the TF logs will show the thread ID
helping with debugging multithreaded processes.

* ```TF_CPP_VMODULE```: Environment variable used for TF VLOGs and takes the
form of `TF_CPP_VMODULE=name=value,...`. For PyTorch/XLA using a configuration like
`TF_CPP_VMODULE=tensor=5` would enable logging such as:

```
2019-10-03 17:23:56.419040: I 27891 torch_xla/csrc/tensor.cpp:1104]
Executing IR graph hash 4211381954965020633 on device TPU:3 done!
2019-10-03 17:23:56.419448: I 27890 torch_xla/csrc/tensor.cpp:1104]
Executing IR graph hash 15483856951158150605 on device TPU:5 done!
2019-10-03 17:23:56.419539: I 27896 torch_xla/csrc/tensor.cpp:1104]
Executing IR graph hash 4211381954965020633 on device TPU:4 done!
...
```

### Retrieving Stack Traces

In the event that the _PyTorch_ process is hanging, it might be useful to include the stack
Expand Down
9 changes: 9 additions & 0 deletions torch_xla/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
import os
GRPC_OPTIONS = [
'grpc.keepalive_time_ms=60000', # 1 min
'grpc.keepalive_timeout_ms=14400000', # 4 hrs
'grpc.http2.max_pings_without_data=0', # unlimited
'grpc.http2.min_ping_interval_without_data_ms=300000', # 5 min
]
os.environ['TF_GRPC_DEFAULT_OPTIONS'] = ','.join(GRPC_OPTIONS)

import torch
from .version import __version__
import _XLAC
Expand Down

0 comments on commit 3d52569

Please sign in to comment.