OpenJDK 17 + Cassandra 5.0.2 + Prometheus JMX exporter + Jolokia exporter + jemalloc2 + Jaeger tracing
Current version: Cassandra v5.0.2, now with more configurability through the envs!
Due to myriad of different licenses employed here, please take a look at the summary detailed here.
If you're migrating from Cassandra 4, just scroll to the bottom.
- 7199 - JMX
- 7198 - Prometheus metrics exporter
- 9042 - Native transport
- 7000 - Internode communications
- 9160 - Thrift client (disabled by default, set env
START_RPC
totrue
to enable it)
- /var/lib/cassandra - data partition
- /var/lib/cassandra/commitlog - commitlog partition
- /var/lib/cassandra/logs - logs
- /var/lib/cassandra/heapdump - heap dumps in case Cassandra crashes
Since this uses OpenJDK 17, you do not need to set anymore any weird environment variables. Just enjoy!
G1 garbage collector is enabled by default.
You don't need to make your images basing off this one.
cassandra.yaml
will be set as you set particular environment variables.
Just set envs as needed. See Dockerfile and entrypoint.py for details.
This exports three volumes -
- for data (/var/lib/cassandra),
- for commitlog (/var/lib/cassandra/commitlog),
- for logs (/var/log/cassandra)
Best mount them as bind.
Recommended options are --network host --privileged
, althrough passing the external host
IP in BROADCAST_ADDRESSes and using auto ONLY for normal addresses works fine with a bridge network.
NEVER USE auto if HOST NETWORKING IS ENABLED!
Any arguments passed to the entry point will be called as through a Cassandra was called. Any extra arguments
will be passed there, after a cassandra -f
.
Or, if you pass bash
command, a bash shell will be set for you with required envs.
For the love of God, disable ADDRESS_FOR_ALL while setting up a second or third node. It sets SEED_NODES to point to this node. It's also deprecated.
Set ADDRESS_FOR_ALL
for a variable that will replace all _ADDRESS.
Following env's values will be placed in cassandra.yaml verbatim (ie, withouting quotes)
- BROADCAST_ADDRESS, LISTEN_ADDRESS, RPC_ADDRESS, RPC_BROADCAST_ADDRESS
- CLUSTER_NAME (will be automatically escaped with quotes), default is Test Cluster
- SEED_NODES - list of comma separated IP addresses to bootstrap the cluster from
In general, if it's found in cassandra.yaml with a dollar sign preceding it, it is safe to assume that environment variable with a given name will be substituted for it.
If you need quotes, bring them with you. See for example how CLUSTER_NAME
is set.
Extra parameters for RTFM
Note that where sizes are required, you should postfix them with MiB or KiB. Where tiems are requires, use milliseconds (ms)
- NUM_TOKENS - by default 256, but take care
- START_RPC - whether to start classic Cassandra Thrift RPC. Default is false, but you might wish to use true
- RPC_PORT - port to which start Thrift RPC, if it is requested.
- DISK_OPTIMIZATION_STRATEGY - pass spinning or ssd, any other option will fail with an error. Default is ssd
- ENDPOINT_SNITCH - endpoint snitch to use, by default it's SimpleSnitch
- AUTHENTICATOR - by default AllowAllAuthenticator, can use also PasswordAuthenticator
- AUTHORIZER - by default AllowAllAuthorizer, can use also CassandraAuthorizer
- PARTITIONER - partitioner to use, by default org.apache.cassandra.dht.Murmur3Partitioner
- ROW_CACHE_SIZE - row cache size to use. By default is 0MiB, which means disabled.
- TOMBSTONE_WARN_THRESHOLD and TOMBSTONE_FAIL_THRESHOLD - there's no unit. RTFM
- COLUMN_INDEX_SIZE - RTFM, default is 64KiB
- BATCH_SIZE_FAIL_THRESHOLD - maximum size of the batch that Cassandra will fail. Unit is KiB. RTFM
- BATCHLOG_REPLAY_THROTTLE - maximum speed at which commit log will be replayed. Default is 512 MiB, which means 512 MiB/s.
- REQUEST_SCHEDULER - defaults to org.apache.cassandra.scheduler.NoScheduler
- READ_REQUEST_TIMEOUT - defaults to 5000ms
- RANGE_REQUEST_TIMEOUT - defaults to 10000ms
- STREAM_THROUGHPUT_OUTBOUND - defaults to 25MiB/s
- WRITE_REQUEST_TIMEOUT - defaults to 2000
- MAX_HEAP_SIZE - defaults to 48g
- NEW_HEAP_SIZE - defaults to 10g don't confuse with HEAP_NEWSIZE!!
- COUNTER_WRITE_REQUEST_TIMEOUTS - defaults to 5000ms
- JMX_AUTH - defaults to yes, set to no to disable JMX auth
- CAS_CONTENTION_TIMEOUT - defaults to 2000ms
- TRUNCATE_REQUEST_TIMEOUT - defaults to 60000ms
- REQUEST_TIMEOUT - defaults to 15000ms
- COMPACTION_THROUGHPUT - defaults to 64MiB/s
- MAX_HINT_WINDOW - defaults to 3h
- ENABLE_USER_DEFINED_FUNCTIONS' - defaults to false
- ENABLE_SCRIPTED_USER_DEFINED_FUNCTIONS - defaults to false
- COMMITLOG_SEGMENT_SIZE - size of a commit log segment. Defaults to 32MiB.
- DISABLE_PROMETHEUS_EXPORTER - if set, Prometheus' exporter will be disabled
- KEY_CACHE_SIZE - default is auto, unit is MiB
- FILE_CACHE_SIZE - size of chunk cache, unit is MiB
- COMMITLOG_TOTAL_SPACE - space to use for commit log. Please specify the values, the defaults are difficult to explain.
- COMMITLOG_SYNC - RTFM. Defaults to periodic
- MEMTABLE_HEAP_SIZE - size of heap size for memtables. Default is 1024MiB. Postfix it with MiB please.
- MEMTABLE_OFF_HEAP_SIZE - size of off-heap memtables. Default is 512MiB. Postfix it with MiB please.
- STORAGE_COMPATIBILITY_MODE - one used for updating. Please read the end of this article. Default is None (bootstrap in a Cassandra 5 cluster)
To enable JMX [without SSL] set the environment variable LOCAL_JMX to no, and the environment variable JMX_REMOTE_PASSWORD to target remote password.
This way you will have two users created - monitorRole
with read-only permissions, and controlRole
with read-write JMX permissions, both having the password that you set.
IF you want JMX to bind to a specific interface, define JMX_ADDRESS
.
Following env's would be nice to have, but are not required:
- CASSANDRA_DC - name of this DC that Cassandra is in. dc1 by default.
- CASSANDRA_RACK - name of the rack that Cassandra is in, rack1 by default.
Jolokia is enabled by default and listens on port 8080. If you define the env DISABLE_JOLOKIA
t won't be loaded.
This simply launches cassandra with a -f flag, and passes any extra arguments to that cassandra.
This container spots a built-in healthcheck. It is done by invoking "nodetool status" and seeing it's exit code.
This assumes that 30 minutes will be a sufficient time for your Cassandra to get up and read it's commit logs and initialize.
If this is not the case, start the container with suitable docker run --health-start-period
.
To enable health check just set the environment variable HEALTHCHECK_ENABLE
to 1
.
If you choose not to enable the health check, the container will always be marked as healthy.
Just define an env called DISABLE_PROMETHEUS
.
In order to enable Jaeger tracing just define the envs JAEGER_AGENT_HOST
, and optionally
JAEGER_AGENT_PORT
, which is 6831 by default.
Note that this uses our custom
version of cassandra-jaeger-tracing
.
In order to trace spans that don't fit within an UDP packet (you'll see lots of logs saying that frame is too large),
just define an env called JAEGER_ENDPOINT
that will point directly to collector, eg. http://jaeger_collector:14268/api/traces
.
In this case you don't need to set either JAEGER_AGENT_HOST
and JAEGER_AGENT_PORT
.
Not setting any of these will result in Cassandra's tracker being used.
If you invoke this container with a single argument of "bash", it will drop you to a shell without starting anything.
If you set an env called EXTRA1 it will get automatically appended to cassandra-env.sh, producing an extra line of:
JVM_OPTS="$JVM_OPTS ${EXTRA1}"\n
You can add any number, starting from numbering them EXTRA1, without any limit.
It's important that they are consecutive numbers. These will simply enlarge your JVM_OPTS
. You can for example
use it to replace a dead node.
Assertions are disabled by default in order to provide a modest speed-up. To enable them, use an
env called ENABLE_ASSERTIONS
and set it to 1
.
GC can be logged to:
not logged
(default value ofLOG_GC=none
)- file /var/log/cassandra.gc (
LOG_GC=file
) - standard output (
LOG_GC=stdout
)
- vm.max_map_count = 1048575
- echo 8 > /sys/block/sda/queue/read_ahead_kb for the drive storing Cassandra data
Remember to change the env of JAVA_HOME
to /usr/lib/jvm/java-11-openjdk-amd64
!
For every node:
1. Stop it
2. Check that environment variables match (they changed a lot, they added units, this README details them all)
3. Set `STORAGE_COMPATIBILITY_MODE` to `UPGRADING`
4. Start the node, wait for it to join the cluster.
5. Run `nodetool upgradesstables` on all SSTables that this node has.
Now for every node:
1. Stop it
2. Change `STORAGE_COMPATIBILITY_MODE` to `NONE`
3. Start it