raise ValueError('Optimizer must have a "lr" attribute.') #133

21-10-4 · 2024-02-28T09:32:27Z

Describe the bug

Traceback (most recent call last):
  File "train_moat_tfrecord.py", line 424, in <module>
    history = model.fit(
  File "/home/c/anaconda3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/c/anaconda3/lib/python3.8/site-packages/keras/callbacks.py", line 2240, in on_epoch_begin
    raise ValueError('Optimizer must have a "lr" attribute.')
ValueError: Optimizer must have a "lr" attribute.

To Reproduce
code:

with strategy.scope():
        model = MoAt(
            in_shape=(image_size, image_size, 3),
            out_classes=total_labels,
            definition_name=definition_name,
            window_sides=window_sides,
            input_scaling="inception",
            stochdepth_rate=stochdepth_rate,
        )
        if args.checkpoint is None and args.origin==False:
            save_path = "outputs/%s_%s" % (definition_name, date_time)
        elif args.origin == False and args.checkpoint is not None:
            save_path = f'{checkpoint_dir}'
            """# 在开始训练前检查是否有已保存的检查点，如果有的话，加载权重
            print("Resuming training from checkpoint:", args.checkpoint)
            # 创建一个检查点对象
            checkpoint = tf.train.Checkpoint(model=model)

            # 指定要加载的 checkpoints 文件路径
            # checkpoint.restore(args.checkpoint)
            # model =  tf.keras.models.load_model(args.checkpoint) """
            model = tf.keras.models.load_model(args.checkpoint, compile=False) # 暂时不编译
        elif args.origin==True:
            save_path = "outputs/%s_%s" % (definition_name, date_time)
            origin_model = tf.keras.models.load_model("results/reference")
            i = 0
            for layer_original, layer_modified in zip(origin_model.layers[:-2], model.layers[:-2]):
                if layer_original.get_weights():
                    i+=1
                    layer_modified.set_weights(layer_original.get_weights())
            print("load from reference:", i)
        f1 = F1Score(total_labels, "micro", 0.4) # >0.4的为1，其他为0；全局范围内计算
        rec_at_p65 = tf.keras.metrics.RecallAtPrecision(0.65, num_thresholds=1024) # 在>=0.65的精确率的情况下，计算最大的召回率；从1024个不同thresholds选择阈值
        loss = AsymmetricLoss(
            reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE,
            gamma_neg=asl_gamma_neg,
            gamma_pos=asl_gamma_pos,
            clip=asl_clip,
        )
        curr_opt = Adam(
            learning_rate=warmup_learning_rate,
            weight_decay=weight_decay_rate,
        )
        curr_opt.exclude_from_weight_decay(var_names = [
                r".*(gamma|beta|bias|mean|variance|embedding):0$"
            ]) 
        opt = GradientAccumulateOptimizer(optimizer=curr_opt, accum_steps=6) # 6*6=36
        model.compile(optimizer=opt, loss=loss, metrics=[f1, rec_at_p65])

    t800 = tf.keras.callbacks.TerminateOnNaN()
    sched = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose=True)
        
    rmc_loss = tf.keras.callbacks.ModelCheckpoint(
        "%s/variables/best_model/best"%save_path,
        save_best_only=True,
        save_freq="epoch",
        save_weights_only=True, 
    )

    # 设置tensorboard
    tensorboard_step_writer = tf.summary.create_file_writer(f"{save_path}/tensorboard_step")
    tensorboard_epoch_writer = tf.summary.create_file_writer(f"{save_path}/tensorboard_epoch")
    """ if args.wandb:
        cb_list = [t800, rmc_loss, sched, WandbCallback(save_model=False), CustomCallback(), metrics_csv_logger]
    else: """
    cb_list = [t800, rmc_loss, sched, CustomCallback(), metrics_csv_logger]

    print("initial_epoch:", initial_epoch)
    **history = model.fit(
        training_dataset,
        validation_data=validation_dataset,
        initial_epoch=initial_epoch,
        epochs=total_epochs,
        steps_per_epoch= math.ceil(train_dataset_len // global_batch_size) , 
        validation_steps=math.ceil(val_dataset_len // global_batch_size ), 
        callbacks=cb_list,
    )**

Expected behavior
A clear and concise description of what you expected to happen.

Error logs/Screenshots
If applicable, add logs/screenshots to give more information about the issue.

Desktop (please complete the following information):

OS: [SB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.8.2003 (Core)
Release: 7.8.2003
Codename: Core
]
Python: [3.8.5]
TensorFlow: [2.12.0]

Additional context
error source：

@keras_export("keras.callbacks.LearningRateScheduler")
class LearningRateScheduler(Callback):
....
   def on_epoch_begin(self, epoch, logs=None):
       if not hasattr(self.model.optimizer, "lr"):
           raise ValueError('Optimizer must have a "lr" attribute.')
       try:  # new API
           lr = float(backend.get_value(self.model.optimizer.lr))
           lr = self.schedule(epoch, lr)

The text was updated successfully, but these errors were encountered:

andreped · 2024-02-28T09:43:55Z

There is a PR that should address this issue. See PR #131.

I have been quite preoccupied lately, but I can take a look at this PR after work today and do some tests.

For now, to test if using this new implementation resolves your issue, try installing it from the PR branch in question:

pip install git+https://github.com/dPys/GradientAccumulator.git@optimizer-refactor --force-reinstall

Note that I have yet to test this implementation fully, but this would be a nice way of doing it.

@21-10-4 Would be really helpful if you reported your findings :]

21-10-4 · 2024-02-28T10:12:08Z

There is a PR that should address this issue. See PR #131.

I have been quite preoccupied lately, but I can take a look at this PR after work today and do some tests.

For now, to test if using this new implementation resolves your issue, try installing it from the PR branch in question:
pip install git+https://github.com/dPys/GradientAccumulator.git@optimizer-refactor --force-reinstall
Note that I have yet to test this implementation fully, but this would be a nice way of doing it.

@21-10-4 Would be really helpful if you reported your findings :]

Thank you very much, I did as you said, but a new problem appeared:

Epoch 1: LearningRateScheduler setting learning rate to 4e-05.
Epoch 1/100
Traceback (most recent call last):
  File "train_moat_tfrecord.py", line 426, in <module>
    history = model.fit(
  File "/home/c/anaconda3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filex4w61m3f.py", line 15, in tf__train_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
  File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 368, in apply_gradients
    train_op = super().apply_gradients(grads_and_vars, name, **kwargs)
  File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 306, in _create_slots
    self.base_optimizer._create_slots(var_list=var_list)
AttributeError: in user code:

    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in run_step  **
        outputs = model.train_step(data)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1054, in train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/optimizers/legacy/optimizer_v2.py", line 588, in minimize
        return self.apply_gradients(grads_and_vars, name=name)
    File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 368, in apply_gradients
        train_op = super().apply_gradients(grads_and_vars, name, **kwargs)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/optimizers/legacy/optimizer_v2.py", line 704, in apply_gradients
        self._create_all_weights(var_list)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/optimizers/legacy/optimizer_v2.py", line 968, in _create_all_weights
        self._create_slots(var_list)
    File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 306, in _create_slots
        self.base_optimizer._create_slots(var_list=var_list)

    AttributeError: 'Adam' object has no attribute '_create_slots'

andreped · 2024-02-28T10:38:21Z

Thank you very much, I did as you said, but a new problem appeared:
[...]
AttributeError: 'Adam' object has no attribute '_create_slots'

@21-10-4 This issue I have seen before. I can make a gist to reproduce the issue and see if we can resolve it.

For now, could you test the model wrapper approach instead, as described in the README:
https://github.com/andreped/GradientAccumulator?tab=readme-ov-file#getting-started

IMO, the model wrapper is MUCH more robust for single-GPU applications. Or are you trying to do distributed training?

andreped · 2024-02-28T10:42:32Z

@21-10-4 Aaah, scratch that! Now I remember why the first issue was so familiar.

TF/Keras changed their Optimizer implementation quite a bit from tf >= 2.11. For newer versions we only support the Legacy Optimizer implementation, which should still be available in TF 2.12, AFAIK. This is actually mentioned in the documentations:
https://gradientaccumulator.readthedocs.io/en/latest/faq/optimizer_legacy.html

Basically, do this instead if you wish to use the Optimizer wrapper:

import tensorflow as tf
from gradient_accumulator import GradientAccumulateOptimizer

opt = tf.keras.optimizers.legacy.SGD(learning_rate=1e-2)
opt = GradientAccumulateOptimizer(optimizer=opt, accum_steps=4)

Replace the optimizer above with whichever optimizer you please (e.g., Adam), just remember to use the tf.keras.optimizers.legacy.X path.

21-10-4 · 2024-02-29T03:32:57Z

Thank you ~

21-10-4 added the bug Something isn't working label Feb 28, 2024

github-actions bot assigned andreped Feb 28, 2024

andreped closed this as completed Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raise ValueError('Optimizer must have a "lr" attribute.') #133

raise ValueError('Optimizer must have a "lr" attribute.') #133

21-10-4 commented Feb 28, 2024

andreped commented Feb 28, 2024

21-10-4 commented Feb 28, 2024

andreped commented Feb 28, 2024 •

edited

Loading

andreped commented Feb 28, 2024 •

edited

Loading

21-10-4 commented Feb 29, 2024

raise ValueError('Optimizer must have a "lr" attribute.') #133

raise ValueError('Optimizer must have a "lr" attribute.') #133

Comments

21-10-4 commented Feb 28, 2024

andreped commented Feb 28, 2024

21-10-4 commented Feb 28, 2024

andreped commented Feb 28, 2024 • edited Loading

andreped commented Feb 28, 2024 • edited Loading

21-10-4 commented Feb 29, 2024

andreped commented Feb 28, 2024 •

edited

Loading

andreped commented Feb 28, 2024 •

edited

Loading