Skip to content

Commit

Permalink
add torchbench for Distributed Shampoo Optimizer v2 (pytorch#2616)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: pytorch#2616

- There is no optimizer that has been integrated into TorchBench. Distributed Shampoo is quite complicate, and has a direct dependency on Pytorch. This creates a need to add it to torchbench to guardrail it from Pytorch 2.0 changes.
- This diff is to realize this feature, and particularly to enable Distributed Shampoo on Torchbench in Eager mode.  I will create a follow up diff to add py2 compile feature.
- For the current design of integration:
-- Pick Ads DHEN CMF 5x model, since CMF is a major MC model
-- choose optimizer stage alone benchmarking, rather than a full e2e benchmarking. This is because the computation of optimizer step itself is relatively ligher than fwd and bwd; and picking the e2e would make the optimizer step stage benchmarking results being shadowed by other stages(fwd, bwd) and make the benchmarking result not sensitive
-- build on top of originall ads_dhen_5x pipeline, and skip the fwd and bwd stage, and also set up the Shampoo config inside the Model __init__ stage
-- For Distributed Shampoo, there is a matrix root inverse computation, and in production, this is decided by precondition_frequency and its presence is trivial in the overall computation. And here for torchbench, we also skip it: by add the iteration count to bypass first root inverse compute. I.e.: Inside _prepare_before_optimizer func.
-- Eventually the torchbench would do the following: 1. initialize the ads_dhen_cmf 5x model on a local gpu, preload the data, and do fwd and bwd; 2. change some state variable of Shampoo(iteration step for preconditioning etc), and get the optimizer ready; 3. benchmarking the optimizer with torchbench pipeline, and return the results back

05/16:
- update the diff given the Shampoo v2 impl

Reviewed By: xuzhao9

Differential Revision: D51192560

fbshipit-source-id: 247dceec1587a837aa9ca128252c47e9e0cf42b7
  • Loading branch information
minddrummer authored and facebook-github-bot committed May 22, 2024
1 parent 8e335d3 commit d7a5500
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions fbgemm_gpu/fbgemm_gpu/split_embedding_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ class EmbOptimType(enum.Enum):
PARTIAL_ROWWISE_LAMB = "partial_row_wise_lamb"
ROWWISE_ADAGRAD = "row_wise_adagrad"
SHAMPOO = "shampoo" # not currently supported for sparse embedding tables
SHAMPOO_V2 = "shampoo_v2" # not currently supported for sparse embedding tables
MADGRAD = "madgrad"
EXACT_ROWWISE_WEIGHTED_ADAGRAD = "exact_row_wise_weighted_adagrad" # deprecated
NONE = "none"
Expand Down

0 comments on commit d7a5500

Please sign in to comment.