Skip to content

Back-end implementation of the Open Data Fabric protocol

License

Notifications You must be signed in to change notification settings

kamu-data/kamu-node

Repository files navigation

kamu - planet-scale data pipeline

Release CI Chat

About

Kamu Node is a set of Kubernetes-native applications that can be deployed in any cloud or on-prem to:

  • Operate the stream processing pipelines for a certain set of data flows
  • Continuously verify datasets that you are interested it to catch malicious behavior
  • Execute queries on co-located data

Nodes are the building pieces of the Open Data Fabric and the primary way of contributing resources to the network. Unlike blockchain nodes that maintain a single ledger, Kamu nodes can form loosely connected clusters based on vested interests of their operators in certain data pipelines.

If you are new to ODF - we recommend you to start with Kamu CLI for a gradual introduction.

You should consider Kamu Node when you want to:

  • Build a horizontally-scalable lakehouse for your data
  • Need a decentralized infrastructure for sharing data with your partners or globally without intermediaries
  • Want to continuously operate ODF data pipelines or verify data
  • Need a rich set of data APIs
  • Want to provide data to ODF blockchain oracle

API Server

Prerequisites:

  • Install rustup
  • Install bunyan crate (cargo install bunyan) to get human-readable log output when running services in the foreground

To run API server using local kamu workspace:

# 1. Create a configuration file
{
  echo 'repo:'
  echo '  repoUrl: workspace/.kamu/datasets'
  echo 'datasetEnvVars:'
  echo '  enctyptionKey: QfnEDcnUtGSW2pwVXaFPvZOwxyFm2BOC'
} > config.yaml
# 2. Run
cargo run --bin kamu-api-server -- --config config.yaml run | bunyan

# Alternative: pass the repo url via env:
KAMU_API_SERVER_CONFIG_repo__repoUrl=workspace/.kamu/datasets \
  kamu-api-server run | bunyan

To control log verbosity use the standard RUST_LOG env var:

RUST_LOG="trace,mio::poll=info" cargo run ...

To explore GQL schema run server and open http://127.0.0.1:8080/playground.

To test GQL queries from the CLI:

cargo run --bin kamu-api-server -- gql query '{ apiVersion }' | jq

API Server with Remote Repository (S3 bucket)

To use it:

# 1. Create a configuration file
{
  echo 'repo:'
  echo '  repoUrl: s3://example.com/kamu_repo'
} > config.yaml
# 2. Run
cargo run --bin kamu-api-server -- --config config.yaml run | bunyan

# Alternative: pass the repo url via env:
KAMU_API_SERVER_CONFIG_repo__repoUrl=s3://example.com/kamu_repo \
  kamu-api-server run | bunyan

GitHub Auth

To use API server for GitHub's OAuth, you need to set the following configuration settings:

auth:
  providers:
    - kind: github
      clientId: CLIENT_ID_OF_YOUR_GITHUB_OAUTH_APP
      clientSecret: CLIENT_SECRET_OF_YOUR_GITHUB_OAUTH_APP

Then you can use the following mutation:

mutation GithubLogin {
  auth {
    githubLogin (code: "...") {
      token {
        accessToken
        scope
        tokenType
      }
      accountInfo {
        login
        email
        name
        avatarUrl
        gravatarId
      }
    }
  }
}