Spark NLP 5.2.3: ONNX support for XLM-RoBERTa Token and Sequence Classifications, and Question Answering task, AWS SDK optimizations, New notebooks, Over 400 new state-of-the-art Transformer Models in ONNX, and bug fixes! #14142
maziyarpanahi
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
📢 Overview
Spark NLP 5.2.3 🚀 comes with an array of exciting features and optimizations. We're thrilled to announce support for ONNX Runtime in
XLMRoBertaForTokenClassification
,XLMRoBertaForSequenceClassification
, andXLMRoBertaForQuestionAnswering
annotators. This release also showcases a significant refinement in the use of AWS SDK in Spark NLP, shifting fromaws-java-sdk-bundle
toaws-java-sdk-s3
, resulting in a substantial ~320MB reduction in library size and a 20% increase in startup speed, new notebooks to import external models from Hugging Face, over 400+ new LLM models, and more!We're pleased to announce that our Models Hub now boasts 36,000+ free and truly open-source models & pipelines 🎉. Our deepest gratitude goes out to our community for their invaluable feedback, feature suggestions, and contributions.
🔥 New Features & Enhancements
XLMRoBertaForTokenClassification
annotatorXLMRoBertaForSequenceClassification
annotatorXLMRoBertaForQuestionAnswering
annotatoraws-java-sdk-bundle
to theaws-java-sdk-s3
dependency. This change has resulted in a 318MB reduction in the library's overall size and has enhanced the Spark NLP startup time by 20%. For instance, usingsparknlp.start()
in Google Colab is now 14 to 20 seconds faster. Special thanks to @c3-avidmych for requesting this feature.DeBertaForQuestionAnswering
,DebertaForSequenceClassification
, andDeBertaForTokenClassification
models from HuggingFaceDocumentTokenSplitter
notebookINSTRUCTOR
EmbeddingsRoBertaForTokenClassification
notebookRoBertaForSequenceClassification
notebookOpenAICompletion
notebook with newgpt-3.5-turbo-instruct
model🐛 Bug Fixes
BGEEmbeddings
not downloading in Pythonℹ️ Known Issues
T4 GPU
runtime ONNX models crash when they are used in Colab's T4 GPU runtime #14109📓 New Notebooks
📖 Documentation
❤️ Community support
Installation
Python
#PyPI pip install spark-nlp==5.2.3
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x: (Scala 2.12):
GPU
Apple Silicon (M1 & M2)
AArch64
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x:
spark-nlp-gpu:
spark-nlp-silicon:
spark-nlp-aarch64:
FAT JARs
CPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.2.3.jar
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.2.3.jar
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.2.3.jar
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.2.3.jar
What's Changed
New Contributors
Full Changelog: 5.2.2...5.2.3
Beta Was this translation helpful? Give feedback.
All reactions