Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Update Datasets Version, so that lmms-eval can be used in Offline Environment #335

Open
jungle-gym-ac opened this issue Oct 18, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@jungle-gym-ac
Copy link

I encountered a similar issue as This one, when running lmms-eval with an offline machine(no Internet). load_dataset method still tries to reach Hugging Face Hub when I set HF_DATASETS_OFFLINE to 1.

I looked into this issue from huggingface Datasets and found it is a bug from datasets library, where load_dataset method still tries to reach Hugging Face Hub after settingHF_DATASETS_OFFLINE to 1.

And the bug is fixed with this PR since Datasets Version 2.19.0. And it has been verified here that updating Datasets to newer version ACTUALLY enables lmms-eval to run without bug in offline environment.

So I suggest to update Datasets Version to >= 2.19.0 so that lmms-eval can be used in fully offline environment. Any future plans for that?

(Although there are currently some workarounds for running lmms-eval in offline environment 179 21, I think them inconvenient when you need to evaluate MANY tasks. And I think supporting lmms-eval in offline environment will help a lot of users.)

@Luodian
Copy link
Contributor

Luodian commented Oct 19, 2024

Thanks for this nice suggestion!

Are you often use in offline environment? It's much appreciated that you can send a PR to modify the version and also give us some guidance by adding to ./docs/xxx.md to introduce the usage in offline environment.

@kcz358
Copy link
Collaborator

kcz358 commented Oct 21, 2024

Hi @jungle-gym-ac , have you tried at your side that once exporting the HF_DATASET_OFFLINE env var with the newest dataset version, you no longer need internet access using lmms-eval? If that is the case, I think we will update the dependency in the pyproject.toml and update the docs

@jungle-gym-ac
Copy link
Author

jungle-gym-ac commented Oct 24, 2024

Hi @kcz358, sorry for the late reply! I have tried with datasets version 2.20.0, and SUCCESSFULLY run lmms-eval without internet access with HF_DATASET_OFFLINE=1.

The main steps are:

  • Suppose you have successfully run lmms-eval on a source machine with Internet access. Transfer the datasets you need to test from source machine to the target machine without internet(under $HF_HOME/datasets/ for both machines), and also pack&move the conda environment with updated datasets version.
  • Run lmms-eval on the target machine with the following command:
export HF_DATASET_OFFLINE=1
python3 -m accelerate.commands.launch ... #The exact same command you run on `sourse machine`

And sure I can create a PR for that as @Luodian mentioned.

@Luodian
Copy link
Contributor

Luodian commented Oct 24, 2024

Great to hear that, this helps a lot!

@Luodian Luodian added the enhancement New feature or request label Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants