Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training fails on Apple Silicon with device='mps' #91

Open
zqhuang211 opened this issue Aug 20, 2024 · 0 comments
Open

Training fails on Apple Silicon with device='mps' #91

zqhuang211 opened this issue Aug 20, 2024 · 0 comments

Comments

@zqhuang211
Copy link
Contributor

ultravox-py3.11zhuang@macbook-pro-z:ultravox$  cd /Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox ; /usr/bin/env /Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/bin/python /Users/zhuang/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher 54140 -- -m ultravox.training.train --config ultravox/training/configs/tinyllama_whisper.yaml --adapter_type CFORMER 
[2024-08-20 14:05:03,016] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
INFO:root:Instantiating processor...
INFO:root:Instantiating model...
INFO:root:Model and processor instantiated.
INFO:root:trainable params: 30,513,920 || all params: 1,218,716,416 || trainable%: 2.5%
INFO:root:Trainable%:    LLM: 0.0% || Audio Encoder: 0.0% || Adapter: 100.0%
INFO:root:Using dtype and device (world_size): torch.float32, mps:0 (1)
INFO:root:Loading dataset fixie-ai/librispeech_asr clean train.100 False True
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:00<00:00, 518.63it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 81114.66it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:00<00:00, 122291.40it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 102092.59it/s]
INFO:root:Loaded [DataDictConfig(path='fixie-ai/librispeech_asr', name='clean', splits=['train.100'], num_samples=100000, weight=1.0, streaming=True, user_template='Continue the following text using less than 50 words:\n\n<|audio|>', assistant_template='{{ continuation }}', transcript_template='{{ text }}')] data sets, sample limit: None (val sample limit: 64)
INFO:root:Config Params: TrainConfig(data_sets=[DataDictConfig(path='fixie-ai/librispeech_asr', name='clean', splits=['train.100'], num_samples=100000, weight=1.0, streaming=True, user_template='Continue the following text using less than 50 words:\n\n<|audio|>', assistant_template='{{ continuation }}', transcript_template='{{ text }}')], val_sets=['anyinstruct'], text_model='TinyLlama/TinyLlama-1.1B-Chat-v1.0', audio_model='openai/whisper-small', adapter_type=<AdapterType.CFORMER: 'cformer'>, adapter_config=UltravoxCFormerAdapterConfig(num_pre_cif_layers=2, num_post_cif_layers=2), data_dicts=[DataDictConfig(path='fixie-ai/librispeech_asr', name='clean', splits=['train.100'], num_samples=100000, weight=1.0, streaming=True, user_template='Continue the following text using less than 50 words:\n\n<|audio|>', assistant_template='{{ continuation }}', transcript_template='{{ text }}')], do_train=True, do_eval=True, stop_strategy=<StopStrategy.LAST_EXHAUSTED: 'last_exhausted'>, data_dir=None, mds=False, num_samples=None, val_num_samples=64, eval_num_samples=256, eval_max_new_tokens=32, eval_num_procs=16, num_prompts=1, num_workers=1, train_on_inputs=False, shuffle_data=True, max_audio_duration_secs=16, verbose=False, device='mps', data_type='float32', model_load_dir=None, text_model_lora_config=None, audio_model_lora_config=None, disable_layerdrop=False, exp_name='tinyllama_whisper_s', output_dir=PosixPath('runs/tinyllama_whisper_s'), logs_dir=PosixPath('runs/tinyllama_whisper_s/logs'), optimizer='adamw_torch', num_epochs=1, max_steps=100, val_steps=1000, save_steps=0.25, logging_steps=100, grad_accum_steps=1, val_accum_steps=1, batch_size=4, lr=0.002, lr_scheduler='cosine', lr_warmup_steps=10, weight_decay=0.0, seed=42, shuffle_seed=42, report_logs_to=['tensorboard'], run_tags=[], loss_config=None)
/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
max_steps is given, it will override any value given in num_train_epochs
INFO:root:Starting training...
INFO:root:train start time: 2024-08-20 14:05:12.865207
Traceback (most recent call last):
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/zhuang/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/zhuang/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/zhuang/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 317, in run_module
    run_module_as_main(options.target, alter_argv=True)
  File "/Users/zhuang/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 238, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/ultravox/training/train.py", line 333, in <module>
    main()
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/ultravox/training/train.py", line 294, in main
    trainer.evaluate()
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/transformers/trainer_seq2seq.py", line 180, in evaluate
    return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 3648, in evaluate
    dataset_metrics = self.evaluate(
                      ^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/transformers/trainer_seq2seq.py", line 180, in evaluate
    return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 3666, in evaluate
    output = eval_loop(
             ^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/transformers/trainer.py", line 3847, in evaluation_loop
    for step, inputs in enumerate(dataloader):
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/accelerate/data_loader.py", line 671, in __iter__
    main_iterator = super().__iter__()
                    ^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 439, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 387, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1040, in __init__
    w.start()
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/zhuang/.pyenv/versions/3.11.9/lib/python3.11/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 557, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/torch/storage.py", line 294, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhuang/expts/2024-08-12-ultravox/input_kd-1a/ultravox/.venv/lib/python3.11/site-packages/torch/storage.py", line 368, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: _share_filename_: only available on CPU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant