This is a Databricks Notebook Utility in Python to export Libraries either from a given list of clusters or all clusters present in the workspace to a filesystem location : DBFS/ADLS/S3. It also maintains date wise versioning so while importing later, libraries exported of any given cluster for a specific date can be imported.
Following variables needs to be set:
- EXPORT_LOG_TABLE : Delta Table for storing logs of clusters whose libraries are exported
- LIBRARY_EXPORT_PATH : Location where exported Libraries will be stored
- CLUSTER_NAMES : name of Clusters whose libraries needs to be exported, leave blank if export needs to done for all clusters
All pre-requisites are done now, All we need to do is create an object of ExportJars class and call the export_cluster_libraries method using the object.
Once the processing is done you can validate the export metrics in the table mentioned in EXPORT_LOG_TABLE:
On the FileSystem you will be able to see the the Libraries exported along with CLUSTER_NAME.json file.
This JSON file contains the information of all the libraries exported for the given cluster and will be utilised while importing the libraries to another cluster.
This is a Databricks Notebook Utility in Python to import Libraries from a cluster previous exported to a filesystem location to our target cluster.
Following variables needs to be set:
- BASE_PATH : Location where Libraries were exported, same libraries will be imported from within this location
- SOURCE_CLUSTER : Libraries of the cluster that needs to be imported
- TARGET_CLUSTER : Name of the cluster where libraries of source cluster needs to be installed
- LOG_TABLE_NAME : Delta Table for storing logs of clusters where libraries were imported
- IMPORT_DATE : The version of libraries that needs to be imported
All pre-requisites are done now, All we need to do is create an object of ImportJars class and call the import_libraries_to_cluster method using the object.
Once the processing is done you can validate the export metrics in the table mentioned in LOG_TABLE_NAME:
Otherwise you can also utilise it to manually specify a list of libraries to install them on any cluster without the need of exporting it first.
On the Cluster page you will be able to see the libraries being installed.