You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering what's the best practice for using this package with pandas.
It's possible to create a databricks.sql.connect and pass it to pandas.read_sql. This works however it raises
UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2
connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
Alternatively it's possible to use SQLAlchemy with a databricks:// URL and pass that to pandas. Doesn't it mean an extra serialization step performance wise though?
What's the recommended way, in particular regarding performance? Would both use CloudFetch for larger queries? I see there are some fixes/improvements done for pandas done in PRs so which API should be used to benefit from those?
Hello,
I was wondering what's the best practice for using this package with pandas.
databricks.sql.connect
and pass it topandas.read_sql
. This works however it raisesdatabricks://
URL and pass that to pandas. Doesn't it mean an extra serialization step performance wise though?What's the recommended way, in particular regarding performance? Would both use CloudFetch for larger queries? I see there are some fixes/improvements done for pandas done in PRs so which API should be used to benefit from those?
Thanks!
cc @kravets-levko
The text was updated successfully, but these errors were encountered: