feat: add mixed-precision and agc to gradaccum optimizer #131

dPys · 2024-01-21T23:42:40Z

Optimizer now supports mixed-precision with loss-scaling
Purges eager ops from optimizer in favor of pure TF and @tf.functions, supports static graph mode, reduces excessive graph tracing
Cleans up docstring, simplifies conditional branching where possible, abstracts repetitive code into helper methods, adds missing properties expected by the API for some TF versions (e.g. lr)

andreped

@dPys Great job! Looks very interestin!

I have enabled such that you can run CIs for this PR. Note that linting fails straight away. Please, fix that, and run tests to see if the proposed modifications are compatible with the unit/integration tests.

andreped · 2024-01-29T14:32:09Z

@dPys Do you have time to address my concerns, such that tests can be performed? I cannot add new features/changes/PRs before tests are passing.

dPys · 2024-01-29T17:14:52Z

Hi @andreped , yes -- currently very sick / out-of-commission, but I should be able to get to this in the coming days. Stay tuned!

setup.py

gradient_accumulator/accumulators.py

gradient_accumulator/agc.py

andreped

I see that tests seem to be failing:
https://github.com/andreped/GradientAccumulator/actions/runs/7779143998/job/21215129854?pr=131

You should run these locally and see if you can resolve them there first. Easiest way to run these tests is using pytest, see here:
https://github.com/andreped/GradientAccumulator/blob/main/.github/workflows/test.yml#L92

dPys · 2024-02-06T19:47:48Z

Tests should now be passing (or at least very close) :-)

andreped · 2024-02-07T08:03:57Z

I recommend running the tests listed here locally and verifying that these run locally:

GradientAccumulator/.github/workflows/test.yml

Line 91 in 033aeda

run: |

A good idea would be to build the wheel in the original virtual env you are working in locally, then installing it in a new virtual env, before running all new tests. That's a nice way of verifying that the wheel works as it should.

I have enabled such that CIs can be ran now, without requiring that you have made a merged PR previously. So now, running new tests should be possible for future commits in this PR.

EDIT: I also see that you have modified the tests/test_mixed_precision.py test, which I guess you used when debugging, but it should not be altered in the main branch. So feel free to revert that.

codecov · 2024-02-07T14:48:33Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (2eb285e) 97.58% compared to head (58e42e8) 98.00%.
Report is 2 commits behind head on main.

Files	Patch %	Lines
gradient_accumulator/accumulators.py	98.09%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #131      +/-   ##
==========================================
+ Coverage   97.58%   98.00%   +0.41%     
==========================================
  Files           5        5              
  Lines         248      350     +102     
==========================================
+ Hits          242      343     +101     
- Misses          6        7       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dPys · 2024-02-07T14:56:59Z

I recommend running the tests listed here locally and verifying that these run locally:

GradientAccumulator/.github/workflows/test.yml

Line 91 in 033aeda

run: |

A good idea would be to build the wheel in the original virtual env you are working in locally, then installing it in a new virtual env, before running all new tests. That's a nice way of verifying that the wheel works as it should.

I have enabled such that CIs can be ran now, without requiring that you have made a merged PR previously. So now, running new tests should be possible for future commits in this PR.

EDIT: I also see that you have modified the tests/test_mixed_precision.py test, which I guess you used when debugging, but it should not be altered in the main branch. So feel free to revert that.

Whoops, I had accidentally pushed the wrong commit before to trigger the build. Looks like that has now been corrected. As for the tests/test_mixed_precision.py test change, this was actually intentional in order to cover the mixed-precision case for the optimizer, which was not covered

dPys · 2024-02-07T14:59:02Z

happy to add further unit tests to cover the remaining uncovered lines of the new optimizer class following code review

andreped · 2024-02-07T15:00:30Z

happy to add further unit tests to cover the remaining uncovered lines of the new optimizer class following code review

You are free to make that now, or do that in a new PR when this is merged. But let's see if all tests pass first. Looks very promising :] Great work!!!

andreped · 2024-02-07T15:13:24Z

All tests seem to succeed. I see that you have removed a lot of the documentations on various methods that we use for autodoc -> to build documentations. We need to follow this standard, at least to verify that the generation of documentations still work and show what we want the user to see.

You can see here for how I build the docs, and use that when adding documentations to the different classes and methods:
https://github.com/andreped/GradientAccumulator/tree/main/docs

happy to add further unit tests to cover the remaining uncovered lines of the new optimizer class following code review

Lastly, yes, the codecov seemed to have take a massive hit. We need to up the unit tests again, to ensure that our code has been properly tested. This does not need to be perfect, but we should verify that as many of the methods and components at least don't crash at first usage.

Feel free to ask questions when you make an attempt :]

dPys · 2024-02-07T18:41:20Z

All tests seem to succeed. I see that you have removed a lot of the documentations on various methods that we use for autodoc -> to build documentations. We need to follow this standard, at least to verify that the generation of documentations still work and show what we want the user to see.

You can see here for how I build the docs, and use that when adding documentations to the different classes and methods: https://github.com/andreped/GradientAccumulator/tree/main/docs

happy to add further unit tests to cover the remaining uncovered lines of the new optimizer class following code review

Lastly, yes, the codecov seemed to have take a massive hit. We need to up the unit tests again, to ensure that our code has been properly tested. This does not need to be perfect, but we should verify that as many of the methods and components at least don't crash at first usage.

Feel free to ask questions when you make an attempt :]

Absolutely. I'm very used to a high standard for code coverage, both in and out of open-source projects, so can very much appreciate this aspect. I've essentially refactored the majority of the optimizer, so the reduction in code coverage comes as no surprise. Provided that you are on-board with all of the proposed changes in functionality and code organization, then I'm certainly happy to put in the time to get the test code coverage to where it needs to be :-)

andreped · 2024-02-07T18:48:08Z

Provided that you are on-board with all of the proposed changes in functionality and code organization, then I'm certainly happy to put in the time to get the test code coverage to where it needs to be :-)

It would be better for me to have everything ready and at a similar state as the original code, before I do my full review, but the solution looks fine for now, as all tests seem to pass. So feel free to get code coverage to the same or better than than it was :] Looking forward to testing this!

Will have to run some benchmarks to see if there are any performances differences, as well as test if it still works in a distributed setting (to some extent). But that I will do after I have done my full review.

dPys · 2024-02-07T18:59:11Z

Provided that you are on-board with all of the proposed changes in functionality and code organization, then I'm certainly happy to put in the time to get the test code coverage to where it needs to be :-)

It would be better for me to have everything ready and at a similar state as the original code, before I do my full review, but the solution looks fine for now, as all tests seem to pass. So feel free to get code coverage to the same or better than than it was :] Looking forward to testing this!

Will have to run some benchmarks to see if there are any performances differences, as well as test if it still works in a distributed setting (to some extent). But that I will do after I have done my full review.

Excited to be working on this with you!

dPys · 2024-02-07T19:02:16Z

As for the distributed setting, I've tested it for tensorflow 2.0-2.10, but after 2.10 when the optimizer API changed, my guess is that apply_gradients could very well break (as it already does in main ?) so that part might have to get addressed in a follow-up PR...

andreped · 2024-02-07T19:08:26Z

As for the distributed setting, I've tested it for tensorflow 2.0-2.10, but after 2.10 when the optimizer API changed, my guess is that apply_gradients could very well break (as it already does in main ?) so that part might have to get addressed in a follow-up PR...

Nah, no need to address distributed support in this PR. That is outside the scope. This is also outside the scope of this PR. Lets just get this PR through :]

Regarding distributed support, see issue #132, where we aim to find a solution for this. We can make a new PR if we ever get a solution for this, but I have yet to see a solution that does what I expect it to do.

andreped

Code looks technically sound, but lets get the code cov to a similar high level as it was before this PR, and then I can do a more comprehensive review and thorough testing :]

andreped · 2024-02-08T10:53:19Z

I see that you are making great progress on code coverage :]

I guess you are aware of it, but if you want to see exactly what remains to be covered, you can see the current report here:
https://app.codecov.io/gh/andreped/GradientAccumulator/pull/131?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=André+Pedersen

Feel free to tag me in when you feel like it is good enough :] I noticed now that I had some "no-pragma" statements, which you removed, which naturally would degrade the coverage (as I just ignored some altogether). But I had some because some of the methods were not that critical, but then again having coverage on these as well would be better.

dPys · 2024-02-11T06:31:26Z

Done

andreped · 2024-02-11T17:51:48Z

Done

Great work, @dPys!! I will have a look tomorrow :] Looking forward to it!

andreped · 2024-02-28T09:45:16Z

Sorry for the really late reply, @dPys! I have been quite preoccupied lately with work. I can review this PR after work today :]

dPys · 2024-04-16T16:46:23Z

Hi @andreped ! Just wanted to follow-up to see whether the clouds have parted for you yet with work? eager to hear any final thoughts you might have on this contribution, and really just hopeful to see it merged :-)

feat: upgrade optimizer and agc

8107142

dPys mentioned this pull request Jan 21, 2024

No mixed precision support with GradientAccumulateOptimizer? #113

Open

feat: swap base class for

7e31c64

andreped self-requested a review January 22, 2024 00:12

andreped requested changes Jan 22, 2024

View reviewed changes

Derek Pisner added 6 commits January 29, 2024 23:51

fix: change to absolute imports for clarity and ease of testing

0cb355a

fix: install, lint, black

40e1f98

revert: relative imports

fde7729

revert: setup.py

f2680f7

revert: __init__.py

18fcd31

chore: linted

f239f3f

andreped requested changes Feb 5, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

gradient_accumulator/accumulators.py Outdated Show resolved Hide resolved

gradient_accumulator/agc.py Outdated Show resolved Hide resolved

gradient_accumulator/agc.py Outdated Show resolved Hide resolved

andreped self-requested a review February 5, 2024 07:58

andreped requested changes Feb 5, 2024

View reviewed changes

Derek Pisner added 6 commits February 5, 2024 12:26

revert: grad clipping, address in follow-up PR

a9c2be5

fix: uncomment broken unit test

96d6675

chore: move to utils

609b7bb

chore: re-lint

858a574

fix: failing unit test

311cc35

test: update mixed_precision unit test to also test optimizer

8a0183f

dPys requested a review from andreped February 6, 2024 19:47

fix: broken import (agc)

3264e5e

feat: revert test_mixed_precision

472af58

andreped requested changes Feb 7, 2024

View reviewed changes

Derek Pisner added 5 commits February 7, 2024 22:41

feat: increase test coverage

fa1ef2e

feat: cover parse_grad

657b4f0

fix: missing var def

e382e16

chore: update codecov.yml

4263d20

fix: failing unit test

13111b2

Derek Pisner added 9 commits February 10, 2024 21:33

test: update test_optimizer with additional coverage

edc6221

test: add support for gradients property, clip factor to agc

8407d17

test: fix lr test

682b346

chore: lint

e97a43c

fix: clip_factor -> clipvalue

03d9076

chore: lint

da1c346

chore: lint

7188e30

chore: lint

8d87ab9

test(cov): lr setter

58e42e8

andreped mentioned this pull request Feb 28, 2024

raise ValueError('Optimizer must have a "lr" attribute.') #133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add mixed-precision and agc to gradaccum optimizer #131

feat: add mixed-precision and agc to gradaccum optimizer #131

dPys commented Jan 21, 2024 •

edited

Loading

andreped left a comment

andreped commented Jan 29, 2024

dPys commented Jan 29, 2024

andreped left a comment

dPys commented Feb 6, 2024

andreped commented Feb 7, 2024 •

edited

Loading

codecov bot commented Feb 7, 2024 •

edited

Loading

dPys commented Feb 7, 2024

dPys commented Feb 7, 2024

andreped commented Feb 7, 2024

andreped commented Feb 7, 2024

dPys commented Feb 7, 2024

andreped commented Feb 7, 2024

dPys commented Feb 7, 2024

dPys commented Feb 7, 2024 •

edited

Loading

andreped commented Feb 7, 2024

andreped left a comment

andreped commented Feb 8, 2024

dPys commented Feb 11, 2024

andreped commented Feb 11, 2024

andreped commented Feb 28, 2024 •

edited

Loading

dPys commented Apr 16, 2024

feat: add mixed-precision and agc to gradaccum optimizer #131

Are you sure you want to change the base?

feat: add mixed-precision and agc to gradaccum optimizer #131

Conversation

dPys commented Jan 21, 2024 • edited Loading

andreped left a comment

Choose a reason for hiding this comment

andreped commented Jan 29, 2024

dPys commented Jan 29, 2024

andreped left a comment

Choose a reason for hiding this comment

dPys commented Feb 6, 2024

andreped commented Feb 7, 2024 • edited Loading

codecov bot commented Feb 7, 2024 • edited Loading

Codecov Report

dPys commented Feb 7, 2024

dPys commented Feb 7, 2024

andreped commented Feb 7, 2024

andreped commented Feb 7, 2024

dPys commented Feb 7, 2024

andreped commented Feb 7, 2024

dPys commented Feb 7, 2024

dPys commented Feb 7, 2024 • edited Loading

andreped commented Feb 7, 2024

andreped left a comment

Choose a reason for hiding this comment

andreped commented Feb 8, 2024

dPys commented Feb 11, 2024

andreped commented Feb 11, 2024

andreped commented Feb 28, 2024 • edited Loading

dPys commented Apr 16, 2024

dPys commented Jan 21, 2024 •

edited

Loading

andreped commented Feb 7, 2024 •

edited

Loading

codecov bot commented Feb 7, 2024 •

edited

Loading

dPys commented Feb 7, 2024 •

edited

Loading

andreped commented Feb 28, 2024 •

edited

Loading