experimental support to LOMO optimizer #681

zhenqincn · 2023-08-17T07:15:11Z

Add support to LOMO optimizer for LLM Full Parameter Fine-tuning for Large Language Models with Limited Resources

CLAassistant · 2023-08-17T07:15:17Z

All committers have signed the CLA.

rayrayraykk · 2023-09-04T08:40:46Z

federatedscope/contrib/optimizer/lomo.py

+# SOFTWARE.
+
+import torch
+from torch.optim import Optimizer


Since FS has TF backend, we should use try-catch to avoid error:

try: import torch except ImportError: torch=None

Currently, LOMO has no support to TF. A try-catch statement is added following this suggestion as:

try: import torch except ImportError: torch=None raise ImportError('Currently, LOMO optimizer is only implemented with `pytorch`')

Please don't raise ImportError or anything in header (do it in __init__ or somewhere), since tf backend user might run into error since this file will be imported.

OK, in the latest commit, this error raising has beem moved in __init__ in the corresponding optimizer.

try: import torch except ImportError: torch=None class LOMO(Optimizer): """ an optimizer for LOMOTrainer """ def __init__(self, model, lr=1e-3, clip_grad_norm=None, clip_grad_value=None): if torch is None: raise ImportError('Currently, LOMO optimizer is only implemented with `pytorch`') self.model = model

rayrayraykk · 2023-09-04T08:41:39Z

federatedscope/core/auxiliaries/trainer_builder.py

@@ -160,6 +161,9 @@ def get_trainer(model=None,
            dict_path = "federatedscope.nlp.hetero_tasks.trainer"
        elif config.trainer.type.lower() in ['llmtrainer']:
            dict_path = "federatedscope.llm.trainer.trainer"
+        elif config.trainer.type.lower() in ['lomotrainer']:
+            print('in type')


use logger

This is an omission during development, which has been removed in a new commit.

rayrayraykk · 2023-09-04T08:47:05Z

federatedscope/llm/trainer/lomo_trainer.py

+
+class LOMOTrainer(LLMTrainer):
+    def _hook_on_epoch_start(self, ctx):
+        if not isinstance(ctx.optimizer, LOMO):


this check should be in _hook_on_fit_start_init

This check has been moved in _hook_on_fit_start_init

def _hook_on_fit_start_init(self, ctx): ret = super()._hook_on_fit_start_init(ctx) if not isinstance(ctx.optimizer, LOMO): raise AttributeError(f'"lomo" must be set as the type of ', f'`train.optimizer` if the trainer is LOMOTrainer') return ret

rayrayraykk · 2023-09-04T08:52:22Z

federatedscope/llm/trainer/lomo_trainer.py

+        return super()._hook_on_epoch_start(ctx)
+
+
+    def _hook_on_batch_forward(self, ctx):


Since the train and eval will use the same hook_func, we should add an if-else when eval only needs one forward, right?

Sure, one additional check has been added as following:

if ctx.cur_mode in [MODE.TRAIN, MODE.FINETUNE] \ and ( not ctx.skip_this_batch and ctx.optimizer.clip_grad_norm is not None and ctx.optimizer.clip_grad_norm > 0 ):

rayrayraykk · 2023-09-04T08:54:20Z

Another minor issue is that since the FS-LLM is publicly available now, the dev/llm branch is deprecated. And you can change the target branch of this PR to llm, thx!

zhenqincn · 2023-09-05T09:34:01Z

All suggestions provided above have been adopted. A review is re-requested. Many thanks.

zhenqincn · 2023-09-06T06:45:21Z

In the latest commit, the code has been formatted with pre-commit checks passed.

rayrayraykk · 2023-09-06T07:43:38Z

federatedscope/contrib/optimizer/lomo.py

+    import torch
+except ImportError:
+    torch = None
+from torch.optim import Optimizer


torch.Optimizer should be in try-catch.

Many thanks. It has been solved.

yxdyc · 2023-09-07T06:09:01Z

federatedscope/contrib/optimizer/lomo.py

+        def func(x):
+            with torch.no_grad():
+                for n, p in self.model.named_parameters():
+                    if p.requires_grad and p.grad is not None:


Line 87 ~ 115：
There are too many judgement branches and they are too deeply nested.I suggest changing it to a series of self-explanatory boolean variables to increase readability

Thanks for this suggestion. The mentioned lines have been reformatted with code comments added in the latest commit.

yxdyc · 2023-09-07T07:27:07Z

federatedscope/contrib/optimizer/lomo.py

+
+        # check if zero3 is enabled
+        p0 = list(self.model.parameters())[0]
+        if hasattr(p0, 'ds_tensor'):  # zero3 is enabled


perhaps add some reference to make the checking reasonable, e.g., stage3_code

Many thanks. This reference has been added in the latest commit.

zhenqincn added 4 commits August 14, 2023 19:44

locate the hooks for lomo

3125cf1

finish lomo demo with removing native implementation of clip_grad_norm

f6c46ac

add support for models with fp16

0cfb86d

release LOMO optimizer

287b152

yxdyc requested review from rayrayraykk, qbc2016 and yxdyc August 17, 2023 07:22

zhenqincn force-pushed the zhenqin/llm branch from 95d3a56 to 287b152 Compare August 22, 2023 05:19

zhenqincn changed the title ~~Zhenqin/llm~~ experimental support to LOMO optimizer Aug 22, 2023

rayrayraykk reviewed Sep 4, 2023

View reviewed changes

update code following the suggestions from rayrayraykk

cba4079

zhenqincn changed the base branch from dev/llm to llm September 5, 2023 09:32

zhenqincn requested a review from rayrayraykk September 5, 2023 09:34

zhenqincn added 3 commits September 5, 2023 17:46

update code following the suggestions from rayrayraykk

b900dd5

move error raising

c0ceb25

format code

718f2a1

rayrayraykk reviewed Sep 6, 2023

View reviewed changes

mv the import of torch.optim into try-cache

2e07079

zhenqincn requested a review from rayrayraykk September 7, 2023 06:04

yxdyc reviewed Sep 7, 2023

View reviewed changes

reformat code following suggestions from "yxdyc"

bcfbe12

yxdyc approved these changes Sep 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experimental support to LOMO optimizer #681

experimental support to LOMO optimizer #681

zhenqincn commented Aug 17, 2023

CLAassistant commented Aug 17, 2023 •

edited

Loading

rayrayraykk Sep 4, 2023

zhenqincn Sep 5, 2023

rayrayraykk Sep 5, 2023

zhenqincn Sep 5, 2023

rayrayraykk Sep 4, 2023

zhenqincn Sep 5, 2023

rayrayraykk Sep 4, 2023

zhenqincn Sep 5, 2023

rayrayraykk Sep 4, 2023

zhenqincn Sep 5, 2023

rayrayraykk commented Sep 4, 2023

zhenqincn commented Sep 5, 2023

zhenqincn commented Sep 6, 2023

rayrayraykk Sep 6, 2023

zhenqincn Sep 6, 2023

yxdyc Sep 7, 2023

zhenqincn Sep 7, 2023

yxdyc Sep 7, 2023

zhenqincn Sep 7, 2023

		return super()._hook_on_epoch_start(ctx)


		def _hook_on_batch_forward(self, ctx):

experimental support to LOMO optimizer #681

Are you sure you want to change the base?

experimental support to LOMO optimizer #681

Conversation

zhenqincn commented Aug 17, 2023

CLAassistant commented Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rayrayraykk commented Sep 4, 2023

zhenqincn commented Sep 5, 2023

zhenqincn commented Sep 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Aug 17, 2023 •

edited

Loading