[T170073014] Rewrite distributed examples for Tensor Parallel, Sequence Parallel, 2D (FSDP + TP) #1201

lessw2020 · 2023-11-21T05:06:00Z

This PR updates the three distributed examples for Tensor Parallel, Sequence Parallel and 2D with the following main changes:
(note - internal reference - task [T170073014] Rewrite TensorParalell/SequenceParallel Examples using our new UX)

1 - move to torchrun launching (see run_.sh files) and relevant world topology introspection in the setup instead of mp.spawn.
2 - move device mesh creation to new api, init_device_mesh
3 - use custom parallelization plans (ColwiseParallel and RowwiseParallel) rather than the previous prebuilt PairwiseParallel() and SequenceParallel()
4 - For the 2D example - used a more relevant swiglu MLP model to showcase applying 2D to a more sophisticated/llama style situation.
5 - Adds more interactive UI for the user (start, per iter, and completion feedback).

netlify · 2023-11-21T05:06:07Z

✅ Deploy Preview for pytorch-examples-preview canceled.

Name	Link
🔨 Latest commit	`5f4a5d3`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-examples-preview/deploys/655e7ed39bbef400093f38f9

lessw2020 · 2023-11-21T06:02:49Z

test failures are related to being unable to import init_device_mesh (?), no gpu's available (?), and lastly need to modify these tests to launch via the .sh files associated with each example (to run torchscript):

Traceback (most recent call last):
  File "tensor_parallelism/tensor_parallel_example.py", line 5, in <module>
    from torch.distributed._tensor.device_mesh import init_device_mesh
ImportError: cannot import name 'init_device_mesh' from 'torch.distributed._tensor.device_mesh' (/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/torch/distributed/_tensor/device_mesh.py)
tensor parallel example failed
Traceback (most recent call last):
  File "tensor_parallelism/sequence_parallel_example.py", line 5, in <module>
    from torch.distributed._tensor.device_mesh import init_device_mesh
ImportError: cannot import name 'init_device_mesh' from 'torch.distributed._tensor.device_mesh' (/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/torch/distributed/_tensor/device_mesh.py)
sequence parallel example failed
Traceback (most recent call last):
  File "tensor_parallelism/two_d_parallel_example.py", line 18, in <module>
    from torch.distributed._tensor.device_mesh import init_device_mesh
ImportError: cannot import name 'init_device_mesh' from 'torch.distributed._tensor.device_mesh' (/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/torch/distributed/_tensor/device_mesh.py)
2D parallel example failed
Requires at least 8 GPUs to run, but got 0.

wanchaol

First pass, I think we can do things a lot simpler. please see inline comments

distributed/tensor_parallelism/original.py

distributed/tensor_parallelism/run_sequence_parallel.sh

distributed/tensor_parallelism/sequence_parallel_example.py

distributed/tensor_parallelism/tensor_parallel_example.py

distributed/tensor_parallelism/two_d_parallel_example.py

wanchaol · 2023-11-21T06:09:15Z

distributed/tensor_parallelism/two_d_parallel_example.py

+    # while for SP, input can be different across all ranks.
+    # We will use dp_rank for setting the random seed
+    # to mimic the behavior of the dataloader.
+    dp_rank = dist.get_rank(dp_pg)


I see, this needs to be consolidate to a device mesh API, cc @wz337

distributed/tensor_parallelism/two_d_parallel_example.py

wanchaol

Looks much better! wondering what's the reason to keep original.py and also some inline comments about imports, etc.

distributed/tensor_parallelism/original.py

distributed/tensor_parallelism/two_d_parallel_example.py

distributed/tensor_parallelism/sequence_parallel_example.py

…ew feedback.

wanchaol

This looks great! Only have some nits for logging, thanks for addressing the comments!

distributed/tensor_parallelism/fsdp_tp_example.py

distributed/tensor_parallelism/sequence_parallel_example.py

distributed/tensor_parallelism/tensor_parallel_example.py

…w import

distributed/tensor_parallelism/fsdp_tp_example.py

… to clear CI

lessw2020 added 10 commits November 15, 2023 14:13

update requirements.txt

21a5fcf

add torchrun support, move to init_device_mesh

f962b60

update twod fully working

bc3c1dd

ensure proper dp group seeding for synth data

11a3bb2

swiglu model added

9cebdf0

sequential running of custom, auto, seq parallel models

2447883

streamline to 2D TP only for two_d_parallel example

a388c20

sequence parallel working...needs init_device_mesh update

842c3f0

seq parallel now using init_device_mesh

3aa1c53

tp and sp examples all working and updated

b54e2ec

facebook-github-bot added the cla signed label Nov 21, 2023

msaroufim requested a review from fduwjj November 21, 2023 05:47

wanchaol reviewed Nov 21, 2023

View reviewed changes

wz337 reviewed Nov 21, 2023

View reviewed changes

distributed/tensor_parallelism/two_d_parallel_example.py Outdated Show resolved Hide resolved

lessw2020 added 2 commits November 21, 2023 13:28

updates from code review

4889e3b

remove utils.py. Sample models created in example files

b215178

wanchaol reviewed Nov 22, 2023

View reviewed changes

lessw2020 added 4 commits November 21, 2023 19:44

remove originals.py, leftover imports, various updates from code revi…

242c328

…ew feedback.

code linting via ruff

2f4a083

code formatting via ruff

742966b

move rank_log to utils.py, update example files

7da71bc

wanchaol approved these changes Nov 22, 2023

View reviewed changes

distributed/tensor_parallelism/fsdp_tp_example.py Outdated Show resolved Hide resolved

distributed/tensor_parallelism/sequence_parallel_example.py Outdated Show resolved Hide resolved

distributed/tensor_parallelism/tensor_parallel_example.py Outdated Show resolved Hide resolved

wanchaol approved these changes Nov 22, 2023

View reviewed changes

lessw2020 added 2 commits November 22, 2023 12:18

move logging imports and config to log_utils, update examples with ne…

836f798

…w import

add gpu verification, update run_python_examples.sh

2de0144

lessw2020 requested a review from msaroufim as a code owner November 22, 2023 21:43

wanchaol reviewed Nov 22, 2023

View reviewed changes

distributed/tensor_parallelism/fsdp_tp_example.py Outdated Show resolved Hide resolved

update min gpu = 4 for fsdp+tp

77fe3d8

move gpu check to top of examples, but before import init_device_mesh…

5f4a5d3

… to clear CI

msaroufim approved these changes Nov 22, 2023

View reviewed changes

msaroufim merged commit c4dc481 into pytorch:main Nov 22, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[T170073014] Rewrite distributed examples for Tensor Parallel, Sequence Parallel, 2D (FSDP + TP) #1201

[T170073014] Rewrite distributed examples for Tensor Parallel, Sequence Parallel, 2D (FSDP + TP) #1201

lessw2020 commented Nov 21, 2023

netlify bot commented Nov 21, 2023 •

edited

Loading

lessw2020 commented Nov 21, 2023

wanchaol left a comment

wanchaol Nov 21, 2023

wanchaol left a comment

wanchaol left a comment

[T170073014] Rewrite distributed examples for Tensor Parallel, Sequence Parallel, 2D (FSDP + TP) #1201

[T170073014] Rewrite distributed examples for Tensor Parallel, Sequence Parallel, 2D (FSDP + TP) #1201

Conversation

lessw2020 commented Nov 21, 2023

netlify bot commented Nov 21, 2023 • edited Loading

✅ Deploy Preview for pytorch-examples-preview canceled.

lessw2020 commented Nov 21, 2023

wanchaol left a comment

Choose a reason for hiding this comment

wanchaol Nov 21, 2023

Choose a reason for hiding this comment

wanchaol left a comment

Choose a reason for hiding this comment

wanchaol left a comment

Choose a reason for hiding this comment

netlify bot commented Nov 21, 2023 •

edited

Loading