Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise ValueError('Optimizer must have a "lr" attribute.') #133

Closed
21-10-4 opened this issue Feb 28, 2024 · 5 comments
Closed

raise ValueError('Optimizer must have a "lr" attribute.') #133

21-10-4 opened this issue Feb 28, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@21-10-4
Copy link

21-10-4 commented Feb 28, 2024

Describe the bug

Traceback (most recent call last):
  File "train_moat_tfrecord.py", line 424, in <module>
    history = model.fit(
  File "/home/c/anaconda3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/c/anaconda3/lib/python3.8/site-packages/keras/callbacks.py", line 2240, in on_epoch_begin
    raise ValueError('Optimizer must have a "lr" attribute.')
ValueError: Optimizer must have a "lr" attribute.

To Reproduce
code:

with strategy.scope():
        model = MoAt(
            in_shape=(image_size, image_size, 3),
            out_classes=total_labels,
            definition_name=definition_name,
            window_sides=window_sides,
            input_scaling="inception",
            stochdepth_rate=stochdepth_rate,
        )
        if args.checkpoint is None and args.origin==False:
            save_path = "outputs/%s_%s" % (definition_name, date_time)
        elif args.origin == False and args.checkpoint is not None:
            save_path = f'{checkpoint_dir}'
            """# 在开始训练前检查是否有已保存的检查点,如果有的话,加载权重
            print("Resuming training from checkpoint:", args.checkpoint)
            # 创建一个检查点对象
            checkpoint = tf.train.Checkpoint(model=model)

            # 指定要加载的 checkpoints 文件路径
            # checkpoint.restore(args.checkpoint)
            # model =  tf.keras.models.load_model(args.checkpoint) """
            model = tf.keras.models.load_model(args.checkpoint, compile=False) # 暂时不编译
        elif args.origin==True:
            save_path = "outputs/%s_%s" % (definition_name, date_time)
            origin_model = tf.keras.models.load_model("results/reference")
            i = 0
            for layer_original, layer_modified in zip(origin_model.layers[:-2], model.layers[:-2]):
                if layer_original.get_weights():
                    i+=1
                    layer_modified.set_weights(layer_original.get_weights())
            print("load from reference:", i)
        f1 = F1Score(total_labels, "micro", 0.4) # >0.4的为1,其他为0;全局范围内计算
        rec_at_p65 = tf.keras.metrics.RecallAtPrecision(0.65, num_thresholds=1024) # 在>=0.65的精确率的情况下,计算最大的召回率;从1024个不同thresholds选择阈值
        loss = AsymmetricLoss(
            reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE,
            gamma_neg=asl_gamma_neg,
            gamma_pos=asl_gamma_pos,
            clip=asl_clip,
        )
        curr_opt = Adam(
            learning_rate=warmup_learning_rate,
            weight_decay=weight_decay_rate,
        )
        curr_opt.exclude_from_weight_decay(var_names = [
                r".*(gamma|beta|bias|mean|variance|embedding):0$"
            ]) 
        opt = GradientAccumulateOptimizer(optimizer=curr_opt, accum_steps=6) # 6*6=36
        model.compile(optimizer=opt, loss=loss, metrics=[f1, rec_at_p65])

    t800 = tf.keras.callbacks.TerminateOnNaN()
    sched = tf.keras.callbacks.LearningRateScheduler(scheduler, verbose=True)
        
    rmc_loss = tf.keras.callbacks.ModelCheckpoint(
        "%s/variables/best_model/best"%save_path,
        save_best_only=True,
        save_freq="epoch",
        save_weights_only=True, 
    )

    # 设置tensorboard
    tensorboard_step_writer = tf.summary.create_file_writer(f"{save_path}/tensorboard_step")
    tensorboard_epoch_writer = tf.summary.create_file_writer(f"{save_path}/tensorboard_epoch")
    """ if args.wandb:
        cb_list = [t800, rmc_loss, sched, WandbCallback(save_model=False), CustomCallback(), metrics_csv_logger]
    else: """
    cb_list = [t800, rmc_loss, sched, CustomCallback(), metrics_csv_logger]

    print("initial_epoch:", initial_epoch)
    **history = model.fit(
        training_dataset,
        validation_data=validation_dataset,
        initial_epoch=initial_epoch,
        epochs=total_epochs,
        steps_per_epoch= math.ceil(train_dataset_len // global_batch_size) , 
        validation_steps=math.ceil(val_dataset_len // global_batch_size ), 
        callbacks=cb_list,
    )**

Expected behavior
A clear and concise description of what you expected to happen.

Error logs/Screenshots
If applicable, add logs/screenshots to give more information about the issue.

Desktop (please complete the following information):

  • OS: [SB Version: :core-4.1-amd64:core-4.1-noarch
    Distributor ID: CentOS
    Description: CentOS Linux release 7.8.2003 (Core)
    Release: 7.8.2003
    Codename: Core
    ]
  • Python: [3.8.5]
  • TensorFlow: [2.12.0]

Additional context
error source:

@keras_export("keras.callbacks.LearningRateScheduler")
class LearningRateScheduler(Callback):
....
   def on_epoch_begin(self, epoch, logs=None):
       if not hasattr(self.model.optimizer, "lr"):
           raise ValueError('Optimizer must have a "lr" attribute.')
       try:  # new API
           lr = float(backend.get_value(self.model.optimizer.lr))
           lr = self.schedule(epoch, lr)
@21-10-4 21-10-4 added the bug Something isn't working label Feb 28, 2024
@andreped
Copy link
Owner

There is a PR that should address this issue. See PR #131.

I have been quite preoccupied lately, but I can take a look at this PR after work today and do some tests.

For now, to test if using this new implementation resolves your issue, try installing it from the PR branch in question:

pip install git+https://github.com/dPys/GradientAccumulator.git@optimizer-refactor --force-reinstall

Note that I have yet to test this implementation fully, but this would be a nice way of doing it.

@21-10-4 Would be really helpful if you reported your findings :]

@21-10-4
Copy link
Author

21-10-4 commented Feb 28, 2024

There is a PR that should address this issue. See PR #131.

I have been quite preoccupied lately, but I can take a look at this PR after work today and do some tests.

For now, to test if using this new implementation resolves your issue, try installing it from the PR branch in question:

pip install git+https://github.com/dPys/GradientAccumulator.git@optimizer-refactor --force-reinstall

Note that I have yet to test this implementation fully, but this would be a nice way of doing it.

@21-10-4 Would be really helpful if you reported your findings :]

Thank you very much, I did as you said, but a new problem appeared:

Epoch 1: LearningRateScheduler setting learning rate to 4e-05.
Epoch 1/100
Traceback (most recent call last):
  File "train_moat_tfrecord.py", line 426, in <module>
    history = model.fit(
  File "/home/c/anaconda3/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/tmp/__autograph_generated_filex4w61m3f.py", line 15, in tf__train_function
    retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
  File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 368, in apply_gradients
    train_op = super().apply_gradients(grads_and_vars, name, **kwargs)
  File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 306, in _create_slots
    self.base_optimizer._create_slots(var_list=var_list)
AttributeError: in user code:

    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in run_step  **
        outputs = model.train_step(data)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/engine/training.py", line 1054, in train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/optimizers/legacy/optimizer_v2.py", line 588, in minimize
        return self.apply_gradients(grads_and_vars, name=name)
    File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 368, in apply_gradients
        train_op = super().apply_gradients(grads_and_vars, name, **kwargs)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/optimizers/legacy/optimizer_v2.py", line 704, in apply_gradients
        self._create_all_weights(var_list)
    File "/home/c/anaconda3/lib/python3.8/site-packages/keras/optimizers/legacy/optimizer_v2.py", line 968, in _create_all_weights
        self._create_slots(var_list)
    File "/var/lib/docker/c/res/GradientAccumulator/gradient_accumulator/accumulators.py", line 306, in _create_slots
        self.base_optimizer._create_slots(var_list=var_list)

    AttributeError: 'Adam' object has no attribute '_create_slots'

@andreped
Copy link
Owner

andreped commented Feb 28, 2024

Thank you very much, I did as you said, but a new problem appeared:
[...]
AttributeError: 'Adam' object has no attribute '_create_slots'

@21-10-4 This issue I have seen before. I can make a gist to reproduce the issue and see if we can resolve it.

For now, could you test the model wrapper approach instead, as described in the README:
https://github.com/andreped/GradientAccumulator?tab=readme-ov-file#getting-started

IMO, the model wrapper is MUCH more robust for single-GPU applications. Or are you trying to do distributed training?

@andreped
Copy link
Owner

andreped commented Feb 28, 2024

@21-10-4 Aaah, scratch that! Now I remember why the first issue was so familiar.

TF/Keras changed their Optimizer implementation quite a bit from tf >= 2.11. For newer versions we only support the Legacy Optimizer implementation, which should still be available in TF 2.12, AFAIK. This is actually mentioned in the documentations:
https://gradientaccumulator.readthedocs.io/en/latest/faq/optimizer_legacy.html

Basically, do this instead if you wish to use the Optimizer wrapper:

import tensorflow as tf
from gradient_accumulator import GradientAccumulateOptimizer

opt = tf.keras.optimizers.legacy.SGD(learning_rate=1e-2)
opt = GradientAccumulateOptimizer(optimizer=opt, accum_steps=4)

Replace the optimizer above with whichever optimizer you please (e.g., Adam), just remember to use the tf.keras.optimizers.legacy.X path.

@21-10-4
Copy link
Author

21-10-4 commented Feb 29, 2024

Thank you ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants