Cleanup TROUBLESHOOTING.md a bit (pytorch#1105)

huamichaelchen · Sep 26, 2019 · bceac47 · bceac47
1 parent bd868b3
commit bceac47
Showing 1 changed file with 13 additions and 17 deletions.
diff --git a/TROUBLESHOOTING.md b/TROUBLESHOOTING.md
@@ -12,7 +12,7 @@ However, constraints in XLA/hardware and the lazy evaluation model suggest certa
 
     _Possible sources_:
     * Direct or indirect uses of `nonzero` introduce dynamic shapes; for example, masked indexing `base[index]` where `index` is a mask tensor.
-    * Loops with a different number of iterations between steps can result in different execution graphs thus require recompilations.
+    * Loops with a different number of iterations between steps can result in different execution graphs, thus require recompilations.
 
     _Solution_:
     * Tensor shapes should be the same between iterations, or a low number of shape variations should be used.
@@ -25,11 +25,11 @@ However, constraints in XLA/hardware and the lazy evaluation model suggest certa
 
     _Possible sources_:
 
-    - The `item()` operation explicitly asks for evaluating the result. Don't use it unless it's necessary.
+    - The `item()` operation explicitly asks to evaluate the result. Don't use it unless it's necessary.
 
     _Solution_:
 
-    - For most ops we can lower them to XLA to fix it. Checkout [metrics report section](#metrics-report) to find out the missing ops and open a feature request on github.
+    - For most ops we can lower them to XLA to fix it. Checkout [metrics report section](#metrics-report) to find out the missing ops and open a feature request on [GitHub](https://github.com/pytorch/xla/issues).
     - Even when a PyTorch tensor is known as a scalar, avoid using `tensor.item()`. Keep it as a tensor and use tensor operations on it.
     - Use `torch.where` to substitute control flow when applicable.
       E.g. The control flow with `item()` used in [clip_grad_norm_](https://github.com/pytorch/pytorch/blob/de19eeee99a2a282fc441f637b23d8e50c75ecd1/torch/nn/utils/clip_grad.py#L33) can be simply replaced by `torch.where` with dramatical performance improvement.
@@ -58,11 +58,11 @@ However, constraints in XLA/hardware and the lazy evaluation model suggest certa
 
 # Debugging
 
-Sometimes bad things happen and a deeper look into the _PyTorch/TPU_ stack is necessary.
+Sometimes, bad things happen and a deeper look into the _PyTorch/TPU_ stack is necessary.
 In order to do that, _PyTorch/TPU_ has a series of environment variables and function calls
 which can help understading its internal behavior.
 
-Note that the infromation in this section is subject to be removed in future releases of
+Note that the information in this section is subject to be removed in future releases of
 the _PyTorch/TPU_ software, since many of them are peculiar to a given internal implementation
 which might change.
 
@@ -77,13 +77,12 @@ torch_xla._XLAC._xla_metrics_report()
 
 Printing out that information can help during the debug phases and while reporting issues.
 
-The information included within the metrics report include things like
+The information included within the metrics report includes things like:
 - how many time we issue _XLA_ compilations and time spent on issuing.
 - how many times we execute and time spent on execution
-- how many device data handles we create/destroy etc...
+- how many device data handles we create/destroy etc.
 
-These information is reported in terms of percentiles of the samples.
-An example is:
+This information is reported in terms of percentiles of the samples. An example is:
 
 ```
 Metric: CompileTime
@@ -94,9 +93,7 @@ Metric: CompileTime
   Percentiles: 1%=001ms32.778us; 5%=001ms61.283us; 10%=001ms79.236us; 20%=001ms110.973us; 50%=001ms228.773us; 80%=001ms339.183us; 90%=001ms434.305us; 95%=002ms921.063us; 99%=21s102ms853.173us
 ```
 
-The _PyTorch/TPU_ stack also has counters, which are named integer variables tracks
-internal software status.
-Example:
+We also provide counters, which are named integer variables which track internal software status. For example:
 
 ```
 Counter: CachedSyncTensors
@@ -107,17 +104,16 @@ In this report, any counter that starts with `aten::`
 indicates a context switch between the XLA device and CPU, which can be a
 potential performance optimization area in the model code.
 
-Counters are useful to understand which operations the _PyTorch/TPU_ stack is routing
-back to the CPU engine of _PyTorch_.
-Things which looks like a _C++_ namespace are part of this category:
+Counters are useful to understand which operations are routed back to the CPU engine of _PyTorch_.
+They are fully qualified with their C++ namespace:
 
 ```
 Counter: aten::nonzero
   Value: 33
 ```
 
 If you see `aten::` ops other than `nonzero` and `_local_scalar_dense`, that usually means a missing
-lowering in PyTorch/XLA, feel free to open a feature request for it on github issues.
+lowering in PyTorch/XLA. Feel free to open a feature request for it on [GitHub issues](https://github.com/pytorch/xla/issues).
 
 ## Environment Variables
 
@@ -170,7 +166,7 @@ only be enabled for debugging.
 ## Retrieving Stack Traces
 
 In the event that the _PyTorch_ process is hanging, it might be useful to include the stack
-traces together with the _Github_ issue.
+traces together with the GitHub issue.
 
 First thing is to find out which PID the _PyTorch_ process is associated with. Using the ```ps```
 command it is possible to find that information. It will be a _python_ process running your