[Feature Request] Update Datasets Version, so that lmms-eval can be used in Offline Environment #335

jungle-gym-ac · 2024-10-18T16:42:31Z

I encountered a similar issue as This one, when running lmms-eval with an offline machine(no Internet). load_dataset method still tries to reach Hugging Face Hub when I set HF_DATASETS_OFFLINE to 1.

I looked into this issue from huggingface Datasets and found it is a bug from datasets library, where load_dataset method still tries to reach Hugging Face Hub after settingHF_DATASETS_OFFLINE to 1.

And the bug is fixed with this PR since Datasets Version 2.19.0. And it has been verified here that updating Datasets to newer version ACTUALLY enables lmms-eval to run without bug in offline environment.

So I suggest to update Datasets Version to >= 2.19.0 so that lmms-eval can be used in fully offline environment. Any future plans for that?

(Although there are currently some workarounds for running lmms-eval in offline environment 179 21, I think them inconvenient when you need to evaluate MANY tasks. And I think supporting lmms-eval in offline environment will help a lot of users.)

The text was updated successfully, but these errors were encountered:

Luodian · 2024-10-19T10:12:44Z

Thanks for this nice suggestion!

Are you often use in offline environment? It's much appreciated that you can send a PR to modify the version and also give us some guidance by adding to ./docs/xxx.md to introduce the usage in offline environment.

kcz358 · 2024-10-21T05:12:28Z

Hi @jungle-gym-ac , have you tried at your side that once exporting the HF_DATASET_OFFLINE env var with the newest dataset version, you no longer need internet access using lmms-eval? If that is the case, I think we will update the dependency in the pyproject.toml and update the docs

jungle-gym-ac · 2024-10-24T03:40:46Z

Hi @kcz358, sorry for the late reply! I have tried with datasets version 2.20.0, and SUCCESSFULLY run lmms-eval without internet access with HF_DATASET_OFFLINE=1.

The main steps are:

Suppose you have successfully run lmms-eval on a source machine with Internet access. Transfer the datasets you need to test from source machine to the target machine without internet(under $HF_HOME/datasets/ for both machines), and also pack&move the conda environment with updated datasets version.
Run lmms-eval on the target machine with the following command:

export HF_DATASET_OFFLINE=1
python3 -m accelerate.commands.launch ... #The exact same command you run on `sourse machine`

And sure I can create a PR for that as @Luodian mentioned.

Luodian · 2024-10-24T03:47:45Z

Great to hear that, this helps a lot!

zjwu0522 mentioned this issue Oct 21, 2024

Bug: Unable to calculate metrics from saved predictions using --predict_only and --from_log #337

Open

Luodian added the enhancement New feature or request label Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Update Datasets Version, so that lmms-eval can be used in Offline Environment #335

[Feature Request] Update Datasets Version, so that lmms-eval can be used in Offline Environment #335

jungle-gym-ac commented Oct 18, 2024

Luodian commented Oct 19, 2024

kcz358 commented Oct 21, 2024

jungle-gym-ac commented Oct 24, 2024 •

edited

Loading

Luodian commented Oct 24, 2024

[Feature Request] Update Datasets Version, so that lmms-eval can be used in Offline Environment #335

[Feature Request] Update Datasets Version, so that lmms-eval can be used in Offline Environment #335

Comments

jungle-gym-ac commented Oct 18, 2024

Luodian commented Oct 19, 2024

kcz358 commented Oct 21, 2024

jungle-gym-ac commented Oct 24, 2024 • edited Loading

Luodian commented Oct 24, 2024

jungle-gym-ac commented Oct 24, 2024 •

edited

Loading