CUBLAS_STATUS_INTERNAL_ERROR on spacy 3.0 #7428
-
Hi, first off I'd like to thank you the developers for your hard work on the spacy 3.0 release. It's been a great experience and running smoothly so far. I encountered a problem when I tried to install spacy 3.0 on another VM and tried to train a textcat/transformer model on it. Here's the config file that I used:
Then I run the following command: I get the following error:
When I drop the --gpu-id=0 option from the training command, it runs fine. What could be going wrong? I correctly installed CUDA and CuPy. CUDA works perfectly fine, it doesn't seem to work with spacy only when I'm using transformer models. When I drop the transformer component from the above config file, the training works fine even with the --gpu-id=0 option, but then my accuracy gets nuked. System specs: Here's the weird thing: The above config file works perfectly fine on my other VM which runs on Ubuntu 20.04 and Python 3.8.5, CUDA 11.1, Cupy 8.3.0, and SpaCy 3.0.5. No errors were thrown and model performed perfectly fine. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi, this is probably related to a problem with the torch installation. We'd recommend uninstalling torch and installing torch with the command you get from their quickstart here after picking the right options for your system: https://pytorch.org/get-started/locally/ Googling the error led to some issues related to the most recent version of torch (1.8.0), so it's also possible downgrading to 1.7.1 might help. I also don't see support for CUDA 11.2 there, so maybe 11.1 would be a better choice for now. |
Beta Was this translation helpful? Give feedback.
Hi, this is probably related to a problem with the torch installation. We'd recommend uninstalling torch and installing torch with the command you get from their quickstart here after picking the right options for your system: https://pytorch.org/get-started/locally/
Googling the error led to some issues related to the most recent version of torch (1.8.0), so it's also possible downgrading to 1.7.1 might help. I also don't see support for CUDA 11.2 there, so maybe 11.1 would be a better choice for now.