Checkpointing v2 #333

daniil-quix · 2024-04-11T14:50:34Z

New checkpointing

Why?

Existing commit & recovery implementation has the following issues:

The delivery of produced messages (both for changelogs and output) is not guaranteed and the library doesn't check the delivery callbacks
The changelog offsets are calculated on the fly before they're produced, which leaves room for errors
Local state stores are updated for each processed message which puts more pressure on the disks

Goals

Provide better state consistency guarantees in the At-Least-Once setting.
Optimize the state performance.
Build a foundation for EOS processing

What's changed

Introduced a new class Checkpoint responsible for flushing the Producer, committing topic offsets, and flushing the state in sync.
Now it first flushes the Producer and ensures that all outgoing messages are delivered, then synchronously commits the topic offsets, and only then does it flush the state to the disk.
The Checkpoint commits on a schedule provided by the commit_interval Application setting.
The default interval is 5 seconds, the same as Kafka Consumer's default autocommit interval.
Kafka Consumer autocommit is now always disabled.
The state updates are batched in-memory before the checkpoint is committed.
It improves the state performance because the disks are only touched once in several seconds, although it increases memory usage.
The batches are kept per key, so the more keys are processed during the checkpoint interval, the more memory it will require to store.
To use less memory, users may reduce the commit_interval value.
The changelog messages now have the source topic-partition-offset info attached in the headers, and the recovery checks if the changelog belongs to the committed source topic message.
It ensures that the state changes are not applied for the messages that are not yet committed.
The app should always* recover state consistently minimizing the chance for double-counting in the At-Least-Once setting.
The store changelog offsets are now taken from the delivery callbacks (no manual calculation anymore).
The internal RowProducer now has enable.idempotence = True by default to provide stronger delivery guarantees.
This setting can be disabled by providing producer_extra_config = {'enable.idempotence': False} to the Application class.
Lots of refactoring and tests

Caveats

Although the stateful processing performance increases, the apps will require more memory to batch the state updates.
In the At-Least-Once setting, it is still possible that unwanted changelog changes get applied.
Example:

Checkpoint successfully produces changelog updates
Checkpoint fails to commit the source topic offsets
The user changes the application code and some of the input messages get filtered during re-processing.
Since the changelogs are already produced, during recovery from scratch the app will apply them to the state.
To mitigate this, the EOS should be used, which we will implement in the future

Docs

I'll update the docs after the code review

Do not write updates to Writebatch but do it once in the end instead

- Added new Checkpoint class to sync state updates and Kafka commits - Updated state transactions to span across multiple offsets - Added new ProcessingContext class to share dependencies and checkpoints between Application and SDF

- Save the latest produced TPs and offsets using delivery callbacks - If delivery callback returns an error, raise it on next produce() or flush() - Move KafkaMessageError outside `rowconsumer.py` and rename it to KafkaException

- Make flushing state to the disk and producing changelogs separate operations - Rename "maybe_flush" -> "flush"

- Move changelog producing code from Partition to StateTransaction - Make "processed_offset" a required param in "prepare()" - Pass the source topic info to the ChangelogProducer and add it to the changelog messages

…actor tests

tim-quix

Really nice work.

Nothing major (and some things you already addressed).

For documentation reasons, Daniil and I also discussed potentially implementing a caching class for the RDB transactions, and probably some sort of batching for recovery as well. He also has some other outstanding refactoring he'd like to do, but out of scope of these changes.

Also, don't forget to build/update any docs as necc (or maybe we just do that prior to a release?) =)

quixstreams/checkpointing/checkpoint.py

quixstreams/app.py

quixstreams/state/rocksdb/partition.py

quixstreams/processing_context.py

quixstreams/state/recovery.py

quixstreams/app.py

quixstreams/state/rocksdb/transaction.py

Fix typo Co-authored-by: Tim Sawicki <[email protected]>

tim-quix

Woo, lets merge!

daniil-quix marked this pull request as draft April 11, 2024 14:50

daniil-quix marked this pull request as ready for review April 30, 2024 17:09

daniil-quix added 18 commits May 1, 2024 11:17

Batch state updates in RocksDBPartitionTransaction

f469cca

Do not write updates to Writebatch but do it once in the end instead

New Checkpointing flow (part 1)

e7aeb57

- Added new Checkpoint class to sync state updates and Kafka commits - Updated state transactions to span across multiple offsets - Added new ProcessingContext class to share dependencies and checkpoints between Application and SDF

Accept delivery callbacks in Producer.produce()

877ead9

Update RowProducer to track message delivery

d4b5f37

- Save the latest produced TPs and offsets using delivery callbacks - If delivery callback returns an error, raise it on next produce() or flush() - Move KafkaMessageError outside `rowconsumer.py` and rename it to KafkaException

Separate changelog producing and flush in State

0267e00

- Make flushing state to the disk and producing changelogs separate operations - Rename "maybe_flush" -> "flush"

Separate exceptions for RowProducer and RowConsumer

00df6c3

Fix failing Application tests

f1a7b19

Move checkpoint to a module and add tests

9ad15dd

Expose changelog name and partition on ChangelogProducer

8a4ba03

Add missing recovery_manager_factory

a7cffe4

Add more logs to store transaction

5c98975

Log elapsed time of the checkpoint commit

d4ed15c

Fix recovery test

a1ff890

Don't commit the checkpoint if application fails

f46d202

Add source topic-partition-offset to changelog messages

4a8d2ec

- Move changelog producing code from Partition to StateTransaction - Make "processed_offset" a required param in "prepare()" - Pass the source topic info to the ChangelogProducer and add it to the changelog messages

Pass latest committed offset to the store partition for recovery, ref…

cf6236c

…actor tests

Remove ApplicationStatus enum

d784d31

Implement consistent recovery

847dfaa

daniil-quix force-pushed the feature/checkpointing-v2 branch from edfa3bd to 847dfaa Compare May 1, 2024 09:31

daniil-quix changed the title ~~Feature/checkpointing v2~~ Checkpointing v2 May 1, 2024

daniil-quix added 2 commits May 1, 2024 13:08

Enable idempotence for internal RowProducer

d8e95e4

Remove topic and partition values from the changelog messages

db2523e

tim-quix reviewed May 6, 2024

View reviewed changes

tim-quix self-requested a review May 6, 2024 20:47

daniil-quix and others added 4 commits May 7, 2024 16:23

Update Checkpoint.commit docstring

b328a7d

Add commit_interval to Application docstring

c3723e9

Update quixstreams/state/rocksdb/transaction.py

21918f0

Fix typo Co-authored-by: Tim Sawicki <[email protected]>

Update quixstreams/app.py

8873f78

Fix typo Co-authored-by: Tim Sawicki <[email protected]>

daniil-quix and others added 4 commits May 7, 2024 16:46

Update quixstreams/state/rocksdb/transaction.py

0b2b5ee

Fix typo Co-authored-by: Tim Sawicki <[email protected]>

Rename _should_skip_changelog -> _should_apply_changelog

2b22515

Remove source_topic_name from changelog classes

3787271

Re-generate API docs

b640699

tim-quix approved these changes May 7, 2024

View reviewed changes

daniil-quix merged commit 207d3f0 into main May 7, 2024
4 checks passed

daniil-quix deleted the feature/checkpointing-v2 branch May 7, 2024 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpointing v2 #333

Checkpointing v2 #333

daniil-quix commented Apr 11, 2024 •

edited

Loading

tim-quix left a comment

tim-quix left a comment

Checkpointing v2 #333

Checkpointing v2 #333

Conversation

daniil-quix commented Apr 11, 2024 • edited Loading

New checkpointing

Why?

Goals

What's changed

Caveats

Docs

tim-quix left a comment

Choose a reason for hiding this comment

tim-quix left a comment

Choose a reason for hiding this comment

daniil-quix commented Apr 11, 2024 •

edited

Loading