Ollama, get up and running with large language models, locally.
This Chart is for deploying Ollama.
-
Kubernetes:
>= 1.16.0-0
for CPU only -
Kubernetes:
>= 1.26.0-0
for GPU stable support (NVIDIA and AMD)
Not all GPUs are currently supported with ollama (especially with AMD)
To install the ollama
chart in the ollama
namespace:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update
helm install ollama ollama-helm/ollama --namespace ollama
First please read the release notes of Ollama to make sure there are no backwards incompatible changes.
Make adjustments to your values as needed, then run helm upgrade
:
# -- This pulls the latest version of the ollama chart from the repo.
helm repo update
helm upgrade ollama ollama-helm/ollama --namespace ollama --values values.yaml
To uninstall/delete the ollama
deployment in the ollama
namespace:
helm delete ollama --namespace ollama
Substitute your values if they differ from the examples. See helm delete --help
for a full reference on delete
parameters and flags.
- Ollama documentation can be found HERE
- Interact with RESTful API: Ollama API
- Interact with official clients libraries: ollama-js and ollama-python
- Interact with langchain: langchain-js and langchain-python
- It's highly recommended to run an updated version of Kubernetes for deploying ollama with GPU
ollama:
gpu:
# -- Enable GPU integration
enabled: true
# -- GPU type: 'nvidia' or 'amd'
type: 'nvidia'
# -- Specify the number of GPU to 2
number: 2
# -- List of models to pull at container startup
models:
- mistral
- llama2
ollama:
models:
- llama2
ingress:
enabled: true
hosts:
- host: ollama.domain.lan
paths:
- path: /
pathType: Prefix
- API is now reachable at
ollama.domain.lan
- See values.yaml to see the Chart's default values.
Key | Type | Default | Description |
---|---|---|---|
affinity | object | {} |
Affinity for pod assignment |
autoscaling.enabled | bool | false |
Enable autoscaling |
autoscaling.maxReplicas | int | 100 |
Number of maximum replicas |
autoscaling.minReplicas | int | 1 |
Number of minimum replicas |
autoscaling.targetCPUUtilizationPercentage | int | 80 |
CPU usage to target replica |
extraArgs | list | [] |
Additional arguments on the output Deployment definition. |
extraEnv | list | [] |
Additional environments variables on the output Deployment definition. |
fullnameOverride | string | "" |
String to fully override template |
image.pullPolicy | string | "IfNotPresent" |
Docker pull policy |
image.repository | string | "ollama/ollama" |
Docker image registry |
image.tag | string | "" |
Docker image tag, overrides the image tag whose default is the chart appVersion. |
imagePullSecrets | list | [] |
Docker registry secret names as an array |
ingress.annotations | object | {} |
Additional annotations for the Ingress resource. |
ingress.className | string | "" |
IngressClass that will be used to implement the Ingress (Kubernetes 1.18+) |
ingress.enabled | bool | false |
Enable ingress controller resource |
ingress.hosts[0].host | string | "ollama.local" |
|
ingress.hosts[0].paths[0].path | string | "/" |
|
ingress.hosts[0].paths[0].pathType | string | "Prefix" |
|
ingress.tls | list | [] |
The tls configuration for hostnames to be covered with this ingress record. |
livenessProbe.enabled | bool | true |
Enable livenessProbe |
livenessProbe.failureThreshold | int | 6 |
Failure threshold for livenessProbe |
livenessProbe.initialDelaySeconds | int | 60 |
Initial delay seconds for livenessProbe |
livenessProbe.path | string | "/" |
Request path for livenessProbe |
livenessProbe.periodSeconds | int | 10 |
Period seconds for livenessProbe |
livenessProbe.successThreshold | int | 1 |
Success threshold for livenessProbe |
livenessProbe.timeoutSeconds | int | 5 |
Timeout seconds for livenessProbe |
nameOverride | string | "" |
String to partially override template (will maintain the release name) |
nodeSelector | object | {} |
Node labels for pod assignment. |
ollama.gpu.enabled | bool | false |
Enable GPU integration |
ollama.gpu.number | int | 1 |
Specify the number of GPU |
ollama.gpu.type | string | "nvidia" |
GPU type: 'nvidia' or 'amd' If 'ollama.gpu.enabled', default value is nvidia If set to 'amd', this will add 'rocm' suffix to image tag if 'image.tag' is not override This is due cause AMD and CPU/CUDA are different images |
ollama.insecure | bool | false |
Add insecure flag for pulling at container startup |
ollama.models | object | {} |
List of models to pull at container startup The more you add, the longer the container will take to start if models are not present models: - llama2 - mistral |
persistentVolume.accessModes | list | ["ReadWriteOnce"] |
Ollama server data Persistent Volume access modes Must match those of existing PV or dynamic provisioner Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/ |
persistentVolume.annotations | object | {} |
Ollama server data Persistent Volume annotations |
persistentVolume.enabled | bool | true |
Enable persistence using PVC |
persistentVolume.existingClaim | string | "" |
If you'd like to bring your own PVC for persisting Ollama state, pass the name of the created + ready PVC here. If set, this Chart will not create the default PVC. Requires server.persistentVolume.enabled: true |
persistentVolume.size | string | "30Gi" |
Ollama server data Persistent Volume size |
persistentVolume.storageClass | string | "" |
Ollama server data Persistent Volume Storage Class If defined, storageClassName: If set to "-", storageClassName: "", which disables dynamic provisioning If undefined (the default) or set to null, no storageClassName spec is set, choosing the default provisioner. (gp2 on AWS, standard on GKE, AWS & OpenStack) |
persistentVolume.subPath | string | "" |
Subdirectory of Ollama server data Persistent Volume to mount Useful if the volume's root directory is not empty |
persistentVolume.volumeMode | string | "" |
Ollama server data Persistent Volume Binding Mode If defined, volumeMode: If empty (the default) or set to null, no volumeBindingMode spec is set, choosing the default mode. |
podAnnotations | object | {} |
Map of annotations to add to the pods |
podLabels | object | {} |
Map of labels to add to the pods |
podSecurityContext | object | {} |
Pod Security Context |
readinessProbe.enabled | bool | true |
Enable readinessProbe |
readinessProbe.failureThreshold | int | 6 |
Failure threshold for readinessProbe |
readinessProbe.initialDelaySeconds | int | 30 |
Initial delay seconds for readinessProbe |
readinessProbe.path | string | "/" |
Request path for readinessProbe |
readinessProbe.periodSeconds | int | 5 |
Period seconds for readinessProbe |
readinessProbe.successThreshold | int | 1 |
Success threshold for readinessProbe |
readinessProbe.timeoutSeconds | int | 3 |
Timeout seconds for readinessProbe |
replicaCount | int | 1 |
Number of replicas |
resources.limits.cpu | string | "" |
CPU limit |
resources.limits.memory | string | "" |
Memory limit |
resources.requests.cpu | string | "" |
CPU request |
resources.requests.memory | string | "" |
Memory request |
runtimeClassName | string | "" |
Specify runtime class |
securityContext | object | {} |
Container Security Context |
service.port | int | 11434 |
Service port |
service.type | string | "ClusterIP" |
Service type |
serviceAccount.annotations | object | {} |
Annotations to add to the service account |
serviceAccount.automount | bool | true |
Automatically mount a ServiceAccount's API credentials? |
serviceAccount.create | bool | true |
Specifies whether a service account should be created |
serviceAccount.name | string | "" |
The name of the service account to use. If not set and create is true, a name is generated using the fullname template |
tolerations | list | [] |
Tolerations for pod assignment |
volumeMounts | list | [] |
Additional volumeMounts on the output Deployment definition. |
volumes | list | [] |
Additional volumes on the output Deployment definition. |
- For questions, suggestions, and discussion about Ollama please refer to the Ollama issue page
- For questions, suggestions, and discussion about this chart please visite Ollama-Helm issue page