Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reward classifier and training #528

Open
wants to merge 40 commits into
base: user/michel-aractingi/2024-11-27-port-hil-serl
Choose a base branch
from

Conversation

ChorntonYoel
Copy link

@ChorntonYoel ChorntonYoel commented Nov 26, 2024

What this does

Title Label
Item 2 of Reward Classifier in issue #504 (Feature)

This PR is meant to add a reward classifier (used to classify if an image of a robot performing a task should get a reward or not), a training file allowing the training of the classifier (with logging + resuming), a config.yaml file that can be used to start a training, and a few tests for the training loop

How it was tested

Using 10 episodes made with the reward system of this PR: #518
Also I added a test file for the training classifier file. Lots of things are mocked but it covers the basics I believe.

How to checkout & try? (for the reviewer)

python lerobot/scripts/train_classifier.py \
    --config-name", "policy/reward_classifier.yaml",

With the wandb entity and the dataset name adapted.

I was able to reproduce 95%+ after a few epochs with facebook/convnext-base-224 as backbone and a dataset of 10 epsiodes of ~15 sec.
This branch was built on top of the branch from #518 so will need to wait for this one to be merged befre merging

@michel-aractingi
Copy link
Collaborator

Great work! looking forward to trying it, let me know when I can review it.

@ChorntonYoel ChorntonYoel changed the base branch from user/aliberts/2024_09_25_reshape_dataset to user/michel-aractingi/2024-11-27-port-hil-serl November 27, 2024 17:36
Copy link
Collaborator

@Cadene Cadene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of adding lerobot/common/policies/classifier/README.md with very short explanation and example commands?

Also Classifier is not a policy (it doesnt output actions). However I dont know what to do, so current implementation is good for now. Does anyone have a better idea?

(PS: waiting for the first branch to merge before approving)

@@ -0,0 +1,280 @@
# test_train_classifier.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice file!

Out of curiosity, why this comment # test_train_classifier.py?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason haha force of habit, for some pre-commit setups you need a comment at the beginning of the file, so I start by a dummy one. will take off

@@ -0,0 +1,49 @@
# @package _global_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice file!

lerobot/scripts/control_robot.py Outdated Show resolved Hide resolved
@ChorntonYoel ChorntonYoel marked this pull request as ready for review November 29, 2024 16:25
@michel-aractingi
Copy link
Collaborator

Nice work @ChorntonYoel ! Could you move the classifier directory to lerobot/common/policies/classifier/ to lerobot/common/policies/hilserl/classifier.

Since now we will only use the reward classifier for hil-serl then we will put everything in its directory. In the future when the classifiers are more established we can have a separate directory in lerobot/common/classifiers to host different kinds of reward recognition models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants