Is it wise to use kafka as the 'source of truth' for mission critical data?
The setup is:
- kafka is the underlying source-of-truth for the data. -querying is done on caches (I.e. Redis, ktables) hydrated from kafka
- Kafka configured for durability (infinite topic retention, 3+ replication factor etc)
- architecture follows CQRS pattern (writes to kafka, reads from the caches)
- architecture allows for eventual consistency between reads & writes
We are not allowed to lose data in any circumstances
In theory the replication should guarantee durability & resiliency. Confluent themselves encourage the above pattern.
The only flaws I can think of, are:
- cache blows up and needs to be rehydrated from scratch -> query
- broker disk gets wiped/corrupted -> kafka rebalance, resulting in prolonged downtime if topics contain mountains of data
Has anyone run and battle tested this kind of setup in production? I.e. encountered disk corruption, brokers going down, but still retain data?
My gut feel is, this is a bad setup, because kafka wasn't originally designed for RMDBS levels of durability, but can't point to a concrete reason why this would be the case.