Python client for Ballista.
This project is versioned and released independently from the main Ballista project and is intentionally not part of the default Cargo workspace so that it doesn't cause overhead for maintainers of the main Ballista codebase.
Creates a new context and connects to a Ballista scheduler process.
from pyballista import SessionContext
>>> ctx = SessionContext("localhost", 50050)
>>> ctx.sql("create external table t stored as parquet location '/mnt/bigdata/tpch/sf10-parquet/lineitem.parquet'")
>>> df = ctx.sql("select * from t limit 5")
>>> pyarrow_batches = df.collect()
>>> df = ctx.read_parquet('/mnt/bigdata/tpch/sf10-parquet/lineitem.parquet').limit(5)
>>> pyarrow_batches = df.collect()
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
maturin develop
Note that you can also run maturin develop --release
to get a release build locally.
python3 -m pytest