John Snow Labs Spark-NLP 3.1.1: New and improved ALBERT with support for external Transformers, real-time metrics in Python notebooks, bug fixes, and many more improvements! #5725
maziyarpanahi
announced in
Announcement
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
We are pleased to release Spark NLP 🚀 3.1.1! We have a new and much-improved ALBERT annotator with support for HuggingFace 🤗 models in Spark NLP. We managed to make AlbertEmbeddings almost 7x times faster on GPU compare to prior releases!
As always, we would like to thank our community for their feedback, questions, and feature requests.
New Features
sparknlp.start()
. Thanks to PySpark 3.x, this is now possible withsparknlp.start(real_time_output=True)
to have the outputs of Spark NLP (such as metrics during training) right in your Jupyter, Colab, and Kaggle notebooks.Bug Fixes & Enhancements
day after tomorrow
orday before yesterday
fix datematecher utils for relatives dates #5706logger
inside session on some setup Fix logger issue in some cluster / add signatures to the sessions #5715init_all_tables
Fix logger issue in some cluster / add signatures to the sessions #5715concepts.md
#5664 thanks to @roger-yu-dsPerformance Improvements
Introducing a new batch annotation technique implemented in Spark NLP 3.1.1 for AlbertEmbeddings annotator to radically improve prediction/inferencing performance. From now on the
batchSize
for these annotators means the number of rows that can be fed into the models for prediction instead of sentences per row. You can control the throughput when you are on accelerated hardware such as GPU to fully utilize it.Performance achievements by using Spark NLP 2.x/3.0.x vs. Spark NLP 3.1.1
(Performed on a Databricks cluster)
We will update this benchmark table in future pre-releases.
Backward compatibility
We have migrated AlbertEmbeddings to TensorFlow v2, the earlier models prior to 3.1.1 won't work after this release. We have already updated the models and uploaded them on Models Hub. You can use
pretrained()
that takes care of it automatically or please make sure you download the new models manually.Documentation
Installation
Python
#PyPI pip install spark-nlp==3.1.1
Spark Packages
spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):
GPU
spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):
GPU
spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):
GPU
Maven
spark-nlp on Apache Spark 3.0.x and 3.1.x:
spark-nlp-gpu:
spark-nlp on Apache Spark 2.4.x:
spark-nlp-gpu:
spark-nlp on Apache Spark 2.3.x:
spark-nlp-gpu:
FAT JARs
CPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-3.1.1.jar
GPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-3.1.1.jar
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark24-assembly-3.1.1.jar
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark24-assembly-3.1.1.jar
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-3.1.1.jar
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark23-assembly-3.1.1.jar
This discussion was created from the release John Snow Labs Spark-NLP 3.1.1: New and improved ALBERT with support for external Transformers, real-time metrics in Python notebooks, bug fixes, and many more improvements!.
Beta Was this translation helpful? Give feedback.
All reactions