Skip to content

Latest commit

 

History

History
120 lines (86 loc) · 3.84 KB

README.md

File metadata and controls

120 lines (86 loc) · 3.84 KB

Spice.ai Quickstart Tutorial using Dremio

The Dremio quickstart uses a publicly accessible demo instance of Dremio loaded with sample datasets. Thus, setting up your own Dremio instance is not required to complete the quickstart, but the same steps can be used to connect to any Dremio instance.

Step 2. Initialize a Spice project:

spice init dremio-demo
cd dremio-demo

Step 2. Set the login credentials that the Spice runtime will use when accessing Dremio. Ensure this command is run in the dremio-demo directory.

spice login dremio -u demo -p demo1234

Step 3. Start the runtime.

spice run

Step 4. Configure the dataset to connect to Dremio:

In another terminal window in the dremio-demo folder:

cd dremio-demo
spice dataset configure

Enter the name of the dataset:

dataset name: (dremio-demo)  taxi_trips

Enter the description of the dataset:

description: taxi trips data in Dremio

Specify the location of the dataset:

from: dremio:datasets.taxi_trips

Specify the Dremio endpoint:

endpoint: grpc://20.163.171.8:32010

Select "y" when prompted whether to locally accelerate the dataset:

Locally accelerate (y/n)? y

The CLI will confirm the dataset has been configured with the following output:

Saved datasets/taxi_trips/dataset.yaml

And the content of dataset.yaml is the following:

from: dremio:datasets.taxi_trips
name: taxi_trips
description: taxi trips data in Dremio
params:
  endpoint: grpc://20.163.171.8:32010
acceleration:
  enabled: true
  refresh_check_interval: 10s
  refresh_mode: full

The Spice runtime terminal will show that the dataset has been loaded:

2024-08-05T05:04:16.524586Z  INFO runtime: Dataset taxi_trips registered (dremio:datasets.taxi_trips), acceleration (arrow, 10s refresh), results cache enabled.
2024-08-05T05:04:16.526366Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2024-08-05T05:04:18.915971Z  INFO runtime::accelerated_table::refresh_task: Loaded 100,000 rows (27.91 MiB) for dataset taxi_trips in 2s 389ms.

Step 5. Run queries against the dataset using the Spice SQL REPL.

In a new terminal, start the Spice SQL REPL.

spice sql

Now query taxi_trips:

sql> select avg(total_amount), avg(tip_amount), count(1), passenger_count from taxi_trips group by passenger_count order by passenger_count asc;
+------------------------------+----------------------------+-----------------+-----------------+
| avg(taxi_trips.total_amount) | avg(taxi_trips.tip_amount) | count(Int64(1)) | passenger_count |
+------------------------------+----------------------------+-----------------+-----------------+
| 23.2                         | 3.8666666666666667         | 3               | 0               |
| 15.394881909237037           | 1.9225555830345065         | 80870           | 1               |
| 17.714222499449637           | 1.967968738533792          | 13627           | 2               |
| 17.441359661495056           | 1.7930070521861776         | 3545            | 3               |
| 17.219515789473686           | 1.3453947368421055         | 1900            | 4               |
| 21.122631578947367           | 1.36                       | 38              | 5               |
| 22.401176470588236           | 2.764705882352941          | 17              | 6               |
+------------------------------+----------------------------+-----------------+-----------------+

Time: 0.03241875 seconds. 7 rows.

Next Steps This quickstart accelerates query performance using Spice Data Accelerators. Experiment with different acceleration options by editing the dataset.yaml file or removing acceleration altogether to have the Spice runtime federate the query to Dremio directly.