Stream state resets if the Jetstream store directory goes into read only mode [v2.10.23] #6211

sudojha · 2024-12-04T06:39:12Z

Observed behavior

Upon jetstream storage directory going into read only mode, the in memory stream state gets reset.

It takes a restart to restore the stream state.

Code inspection revealed the following:

For a stream with ttl enabled, expireMsgs is called periodically.

This internally calls removeMsgBlock. Once the last message block is removed, a new tombstone block is created and needs to be assigned to lmb. Since the store directory is in read only mode the error is returned before the block gets assigned to lmb.

This causes future publish to fail until file system recovers and the server is restarted.
Why does subsequent publish fail?
Once a message block is to be removed from the file system, dirtyCloseWithRemove is called. This method clears the required state of the message block (particularly mfn which stores the qualified file name of the message block on disk).
The in memory representation of message block is stored in filestore under blks. The lmb in filestore shares the same reference. Once the last block which is the lmb is removed, the state of lmb is cleared because file system is in read-only mode. Once this happens, future publish fails even if file system recovers with these errors:

Note that the mfn of lmb is empty hence the msg block file could not be opened.

Expected behavior

Ideally nats server should recover automatically once the file system recovers without needing a restart.

Server and client version

Server version : 2.10.11 running in standalone mode with jetstream enabled.
Was reproducible in 2.10.23-RC.7

Host environment

Mac os 14.4 (23E214) [Should be reproducible on any system)

Steps to reproduce

Create a stream with max age configured eg:
{ "name": "sko", "subjects": [ "sko" ], "retention": "workqueue", "max_consumers": -1, "max_msgs_per_subject": -1, "max_msgs": -1, "max_bytes": -1, "max_age": 10000000000, "max_msg_size": -1, "storage": "file", "discard": "new", "num_replicas": 1, "duplicate_window": 5000000000, "sealed": false, "deny_delete": false, "deny_purge": false, "allow_rollup_hdrs": false, "allow_direct": true, "mirror_direct": false, "consumer_limits": {} }

Publish messages to stream
Remove write access from jetstream store directory ( chmod -R u-w *)
Stop publishing msgs
Wait for ageCheck to kick in.

This will cause stream's first and last sequence to reset to 0.

Side Note

We are also observing a situation where stream state has been reset but the publish still works. ( Similar to #6159)
Consumer seq id here is way ahead of stream seq id (in millions). We are not able to reproduce this exact scenario. Any help in reproducing this would be appreciated.

The text was updated successfully, but these errors were encountered:

wallyqs · 2024-12-04T06:51:10Z

@sudojha about the side note issue, do you have a setup using leafnodes or with streams being deleted and recreated?

sudojha · 2024-12-04T06:52:04Z

@wallyqs i tried this locally. I don't have a setup with leaf nodes or with deletion and recreation of streams.

pranavmehta94 · 2024-12-10T12:29:13Z

@wallyqs We have raised a draft PR for initial review and comments on the approach we propose to handle issues around stream state reset observed when filesystem enters read-only state.
Happy to build/modify the PR based on suggestions!

sudojha added the defect Suspected defect such as a bug or regression label Dec 4, 2024

pranavmehta94 mentioned this issue Dec 10, 2024

Stop stream when stream state inconsistency is detected during flush #6237

Open

wallyqs changed the title ~~Stream state resets if the Jetstream store directory goes into read only mode~~ Stream state resets if the Jetstream store directory goes into read only mode [v2.10.23] Dec 11, 2024

souravagrawal linked a pull request Dec 22, 2024 that will close this issue

Disable JetStream on disk errors #6292

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream state resets if the Jetstream store directory goes into read only mode [v2.10.23] #6211

Stream state resets if the Jetstream store directory goes into read only mode [v2.10.23] #6211

sudojha commented Dec 4, 2024 •

edited

Loading

wallyqs commented Dec 4, 2024 •

edited

Loading

sudojha commented Dec 4, 2024

pranavmehta94 commented Dec 10, 2024

Stream state resets if the Jetstream store directory goes into read only mode [v2.10.23] #6211

Stream state resets if the Jetstream store directory goes into read only mode [v2.10.23] #6211

Comments

sudojha commented Dec 4, 2024 • edited Loading

Observed behavior

Expected behavior

Server and client version

Host environment

Steps to reproduce

Side Note

wallyqs commented Dec 4, 2024 • edited Loading

sudojha commented Dec 4, 2024

pranavmehta94 commented Dec 10, 2024

sudojha commented Dec 4, 2024 •

edited

Loading

wallyqs commented Dec 4, 2024 •

edited

Loading