Perform gradient clipping on global batch when using gradient accumulation #6

ashors1 · 2023-02-14T18:39:34Z

Refactoring to allow gradient clipping to be performed on full batch rather than subbatches when using ShardedStaticAccumulator. Note that this refactor allows us to maintain support for enable_skip_step_on_gradient_anomalies and requires x+1 grad norm calculations per global batch when using ShardedStaticAccumulator with x subbatches (once per subbatch to determine whether step should be skipped, once when applying gradient clipping in base optimizer update) and requires one grad clip per global batch.

This PR should be taken together with the corresponding Paxml PR.

…umulator

zhangqiaorjc

thanks Anna!

praxis/optimizers.py

zhangqiaorjc · 2023-03-05T22:19:48Z

praxis/optimizers.py

+
+        raw_grad_norm = _compute_grad_norm(raw_grads)
+
+        grads, grad_scale = clip_grads(raw_grads, raw_grad_norm)


do we need to compute and return grad_scale?

This is not needed. I no longer return grad_scale with the latest commit

zhangqiaorjc · 2023-03-05T22:28:12Z

praxis/optimizers.py

+            grad_scale = jnp.array(1.0)
+          return grads, grad_scale
+
+        raw_grad_norm = _compute_grad_norm(raw_grads)


iiuc, if clip_grad_single_norm_to_value is True, then raw_grad_norm is not used and we have to compute grad_single_norm separately anyways?

can we move the if-elif-else statement inside out and avoid redundant computation?

Definitely. I have addressed this with my latest commit

zhangqiaorjc · 2023-03-17T17:19:02Z

praxis/optimizers.py

+
+      def scale_gradients(
+          raw_grads: NestedMap,
+          clip_grad_norm_to_value: Optional[float] = None,


looking at praxis optimizers, clip_gradient_norm_to_value and clip_gradient_single_norm_to_value default are 0.0 and not None right?

so perhaps the types here should be float and default 0.0 instead of Optional?

zhangqiaorjc · 2023-03-17T17:21:35Z

praxis/optimizers.py

+          clip_grad_single_norm_to_value: Optional[float] = None):
+
+        def clip_grads(grads):
+          if clip_grad_norm_to_value:


maybe assert only one of them is true?

perform gradient clipping on global batch when using ShardedStaticAcc…

a8dfeeb

…umulator

ashors1 mentioned this pull request Feb 14, 2023

Perform gradient clipping on global batch when using gradient accumulation google/paxml#9

Open

remove AUTHORS file

4380135

zhangqiaorjc self-assigned this Mar 3, 2023

zhangqiaorjc self-requested a review March 3, 2023 04:17

zhangqiaorjc requested changes Mar 5, 2023

View reviewed changes

ashors1 added 5 commits March 6, 2023 09:36

minor refactor, do not return grad_scale

08e4292

Merge branch 'main' of github.com:ashors1/praxis into ga_grad_clip

400cb40

fix indent

42932ea

fix formatting, small ga bug fix

54bdc12

sync with upstream

44c67f7

zhangqiaorjc requested changes Mar 17, 2023

View reviewed changes

ashors1 added 2 commits March 18, 2023 15:56

Merge branch 'main' of github.com:ashors1/praxis into ga_grad_clip

d5051c1

address PR comments

40a6d80

zhangqiaorjc approved these changes Mar 19, 2023

View reviewed changes

zhangqiaorjc added the pull ready label Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform gradient clipping on global batch when using gradient accumulation #6

Perform gradient clipping on global batch when using gradient accumulation #6

ashors1 commented Feb 14, 2023 •

edited

Loading

zhangqiaorjc left a comment

zhangqiaorjc Mar 5, 2023

ashors1 Mar 6, 2023

zhangqiaorjc Mar 5, 2023

ashors1 Mar 6, 2023

zhangqiaorjc Mar 17, 2023

zhangqiaorjc Mar 17, 2023


		raw_grad_norm = _compute_grad_norm(raw_grads)

		grads, grad_scale = clip_grads(raw_grads, raw_grad_norm)

Perform gradient clipping on global batch when using gradient accumulation #6

Are you sure you want to change the base?

Perform gradient clipping on global batch when using gradient accumulation #6

Conversation

ashors1 commented Feb 14, 2023 • edited Loading

zhangqiaorjc left a comment

Choose a reason for hiding this comment

zhangqiaorjc Mar 5, 2023

Choose a reason for hiding this comment

ashors1 Mar 6, 2023

Choose a reason for hiding this comment

zhangqiaorjc Mar 5, 2023

Choose a reason for hiding this comment

ashors1 Mar 6, 2023

Choose a reason for hiding this comment

zhangqiaorjc Mar 17, 2023

Choose a reason for hiding this comment

zhangqiaorjc Mar 17, 2023

Choose a reason for hiding this comment

ashors1 commented Feb 14, 2023 •

edited

Loading