I am designing a system that uses Event Sourcing, CQRS and microservices. I am lead to understand this isn't an uncommon pattern. A key feature of the service needs to be the ability to rehydrate/restore from a system of record. Microservices will produce commands and queries on a MQ (Kafka). Other microservices will respond (events). Commands and queries will be persisted on S3 for purpose of auditing and restoring.
The current thought process was that, for the purposes of restoring the system, we could extract the event log from S3 and simply feed it back into Kafka.
However, this fails to acknowledge changes in both producers and consumers over time. Versioning at the command/query level seems to go some way toward solving the problem but I can't wrap my head around versioning consumers such that I could enforce that when a command, during a restore, is received and processed, it's the exact same [version of the] code that's performing the processing as it was the first time the command was received.
Are there any patterns I can use to solve this? Is anyone aware of other systems that advertise this feature?
EDIT: Adding an example.
A 'buyer' sends a 'question' to a 'seller' on my auction site. The flow looks as follows:
UI -> Web App: POST /question {:text text :to seller-id :from user-id}
Web App -> MQ: SEND {:command send-question :args [text seller-id user-id]}
MQ -< Audit: <command + args appended to log in S3>
MQ -< Questions service: - Record question in DB
- Email seller 'You have a question'
Now, as a result of a new business requirement, I adjust the 'Questions service' consumer, to persist a count of all unread questions. The DB schema is changed. We have had no notion of whether or not a question was read by the seller, until now. The last line becomes:
MQ -< Questions service: - Record question in DB
- Email seller 'You have a question'
- Increment 'unread questions count'
Two commands are issues, one before the change, one after the change. The 'unread questions count' equals 1.
The system crashes. We restored by replaying the commands through the new code. At the end of the restore, our 'unread questions count' equals 2. Even though, in this contrived example, the result is not a catastrophe, the state that has been restored is not what it previously was.