Skip to content

Latest commit

 

History

History
71 lines (52 loc) · 1.91 KB

README.md

File metadata and controls

71 lines (52 loc) · 1.91 KB

PyBallista

Python client for Ballista.

This project is versioned and released independently from the main Ballista project and is intentionally not part of the default Cargo workspace so that it doesn't cause overhead for maintainers of the main Ballista codebase.

Creating a SessionContext

Creates a new context and connects to a Ballista scheduler process.

from pyballista import SessionContext
>>> ctx = SessionContext("localhost", 50050)

Example SQL Usage

>>> ctx.sql("create external table t stored as parquet location '/mnt/bigdata/tpch/sf10-parquet/lineitem.parquet'")
>>> df = ctx.sql("select * from t limit 5")
>>> pyarrow_batches = df.collect()

Example DataFrame Usage

>>> df = ctx.read_parquet('/mnt/bigdata/tpch/sf10-parquet/lineitem.parquet').limit(5)
>>> pyarrow_batches = df.collect()

Creating Virtual Environment

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

Building

maturin develop

Note that you can also run maturin develop --release to get a release build locally.

Testing

python3 -m pytest