CUBLAS_STATUS_INTERNAL_ERROR on spacy 3.0 #7428

DeltaSierra4 · 2021-03-12T20:46:42Z

DeltaSierra4
Mar 12, 2021

Hi, first off I'd like to thank you the developers for your hard work on the spacy 3.0 release. It's been a great experience and running smoothly so far.

I encountered a problem when I tried to install spacy 3.0 on another VM and tried to train a textcat/transformer model on it. Here's the config file that I used:

[paths]
# Various paths and stuff

[system]
seed = 0
gpu_allocator = "pytorch"

[corpora]

[corpora.train]
# Default settings from https://github.com/explosion/projects/blob/v3/tutorials/textcat_goemotions/configs/cnn.cfg

[corpora.dev]
# Default settings from https://github.com/explosion/projects/blob/v3/tutorials/textcat_goemotions/configs/cnn.cfg

[training]
train_corpus = "corpora.train"
dev_corpus = "corpora.dev"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
patience = 50
eval_frequency = 50
dropout = 0.1
max_epochs = 0
max_steps = 50
accumulate_gradient = 3
frozen_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null

[training.logger]
# Default settings from https://github.com/explosion/projects/blob/v3/tutorials/textcat_goemotions/configs/cnn.cfg

[training.optimizer]
# Default settings from spacy init config --pipeline transformer,textcat --GPU

[training.optimizer.learn_rate]
# Default settings from spacy init config --pipeline transformer,textcat --GPU

[training.score_weights]
# Default settings from spacy init config --pipeline transformer,textcat --GPU

[components]

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.textcat]
factory = "textcat_multilabel"
threshold = 0.5

[components.textcat.model]
@architectures = "spacy.TextCatCNN.v1"
exclusive_classes = false
nO = null

[components.textcat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0

[components.textcat.model.tok2vec.pooling]
@layers = "reduce_mean.v1"

[nlp]
lang = "en"
pipeline = ["transformer","textcat"]
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
batch_size = 1000

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.components.textcat]

[initialize.components.textcat.labels]
@readers = "spacy.read_labels.v1"
path = ./textcat.json

[initialize.tokenizer]

Then I run the following command:
python -m spacy train config.cfg --gpu-id=0 --output ./model --paths.train ./train.spacy --paths.dev ./dev.spacy

I get the following error:

ℹ Using GPU: 0

=========================== Initializing pipeline ===========================
[2021-03-12 20:39:14,423] [INFO] Set up nlp object from config
[2021-03-12 20:39:14,435] [INFO] Pipeline: ['transformer', 'textcat']
[2021-03-12 20:39:14,439] [INFO] Created vocabulary
[2021-03-12 20:39:14,439] [INFO] Finished initializing nlp object
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/egtann/.local/lib/python3.6/site-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/home/egtann/.local/lib/python3.6/site-packages/spacy/cli/_util.py", line 69, in setup_cli
    command(prog_name=COMMAND)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/egtann/.local/lib/python3.6/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/egtann/.local/lib/python3.6/site-packages/spacy/cli/train.py", line 56, in train_cli
    nlp = init_nlp(config, use_gpu=use_gpu)
  File "/home/egtann/.local/lib/python3.6/site-packages/spacy/training/initialize.py", line 71, in init_nlp
    nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
  File "/home/egtann/.local/lib/python3.6/site-packages/spacy/language.py", line 1246, in initialize
    proc.initialize(get_examples, nlp=self, **p_settings)
  File "/home/egtann/.local/lib/python3.6/site-packages/spacy_transformers/pipeline_component.py", line 341, in initialize
    self.model.initialize(X=docs)
  File "/home/egtann/.local/lib/python3.6/site-packages/thinc/model.py", line 296, in initialize
    self.init(self, X=X, Y=Y)
  File "/home/egtann/.local/lib/python3.6/site-packages/spacy_transformers/layers/transformer_model.py", line 111, in init
    tensors = model.layers[0].predict(wordpieces)
  File "/home/egtann/.local/lib/python3.6/site-packages/thinc/model.py", line 312, in predict
    return self._func(self, X, is_train=False)[0]
  File "/home/egtann/.local/lib/python3.6/site-packages/thinc/layers/pytorchwrapper.py", line 80, in forward
    Ytorch, torch_backprop = model.shims[0](Xtorch, is_train)
  File "/home/egtann/.local/lib/python3.6/site-packages/thinc/shims/pytorch.py", line 29, in __call__
    return self.predict(inputs), lambda a: ...
  File "/home/egtann/.local/lib/python3.6/site-packages/thinc/shims/pytorch.py", line 38, in predict
    outputs = self._model(*inputs.args, **inputs.kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/models/roberta/modeling_roberta.py", line 808, in forward
    return_dict=return_dict,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/models/roberta/modeling_roberta.py", line 505, in forward
    output_attentions,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/models/roberta/modeling_roberta.py", line 398, in forward
    past_key_value=self_attn_past_key_value,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/models/roberta/modeling_roberta.py", line 328, in forward
    output_attentions,
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/models/roberta/modeling_roberta.py", line 185, in forward
    mixed_query_layer = self.query(hidden_states)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`

When I drop the --gpu-id=0 option from the training command, it runs fine.

What could be going wrong? I correctly installed CUDA and CuPy. CUDA works perfectly fine, it doesn't seem to work with spacy only when I'm using transformer models. When I drop the transformer component from the above config file, the training works fine even with the --gpu-id=0 option, but then my accuracy gets nuked.

System specs:
OS: Ubuntu 18.04
Python: Python 3.6.9
SpaCy: SpaCy 3.0.5
CUDA: 11.2
Cupy: 8.5.0

Here's the weird thing: The above config file works perfectly fine on my other VM which runs on Ubuntu 20.04 and Python 3.8.5, CUDA 11.1, Cupy 8.3.0, and SpaCy 3.0.5. No errors were thrown and model performed perfectly fine.

Answered by adrianeboyd

Mar 15, 2021

Hi, this is probably related to a problem with the torch installation. We'd recommend uninstalling torch and installing torch with the command you get from their quickstart here after picking the right options for your system: https://pytorch.org/get-started/locally/

Googling the error led to some issues related to the most recent version of torch (1.8.0), so it's also possible downgrading to 1.7.1 might help. I also don't see support for CUDA 11.2 there, so maybe 11.1 would be a better choice for now.

View full answer

adrianeboyd · 2021-03-15T07:57:45Z

adrianeboyd
Mar 15, 2021

Hi, this is probably related to a problem with the torch installation. We'd recommend uninstalling torch and installing torch with the command you get from their quickstart here after picking the right options for your system: https://pytorch.org/get-started/locally/

Googling the error led to some issues related to the most recent version of torch (1.8.0), so it's also possible downgrading to 1.7.1 might help. I also don't see support for CUDA 11.2 there, so maybe 11.1 would be a better choice for now.

1 reply

DeltaSierra4 Mar 15, 2021
Author

Worked like a charm! Pytorch was the culprit here, changing the CUDA version didn't seem to affect anything. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUBLAS_STATUS_INTERNAL_ERROR on spacy 3.0 #7428

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

CUBLAS_STATUS_INTERNAL_ERROR on spacy 3.0 #7428

DeltaSierra4 Mar 12, 2021

Replies: 1 comment · 1 reply

adrianeboyd Mar 15, 2021

DeltaSierra4 Mar 15, 2021 Author

DeltaSierra4
Mar 12, 2021

Replies: 1 comment 1 reply

adrianeboyd
Mar 15, 2021

DeltaSierra4 Mar 15, 2021
Author