My understanding is that the main feature Cassandra has to offer is linear performance at any scale; meaning that if I know 1 C* node can handle 500 queries or commands per second from my app, then I also can rest assured that 100 C* nodes added to the same cluster will be able to handle 500 * 100 = 50K queries or commands per second.
My understanding of the main tradeoff between RDBMS and NoSQL is that NoSQL systems tend to favor scalability but that requires them (mechanically, due to the implementation) to need to relax their transactability. Hence NoSQL systems like C* typically scale extremely well but can't offer the classic ACID transactions that RDBMS systems like MySQL can.
My understanding is that since scalability and transactability are mutually exclusive of one another, that there aren't any magical NoSQL databases out there that offer C*-like scalability (again: linear performance at any scale) and that offer Java clients with JTA implemented (commit + rollback capabilities) in them.
These are my assumptions heading into this question: if I'm wrong or misguided about any of them, please begin by correcting me!
Assuming I'm more or less correct in all those assumptions, then what does one do when you actually do need both scalability and ACID transactions? My only idea here would be to implement the following:
- Configure the app to write to an RDBMS (like MySQL) using a JDBC/JTA-compliant driver (e.g. using transactions)
- Somehow, configure the RDBMS to replicate (either in real-time or with very low latency) to a highly-scalable DB (like Cassandra). This might be a configuration option that the RDBMS itself offers, or much more likely, will be code you need to write yourself to continuously ETL out of one system and into another. The idea here is that the app would then read from the NoSQL table, and still have very performant reads against a massive amount of data.
- Somehow configure the RDBMS tables with TTLs so that the tables don't grow extremely large and start requiring sharding and other tricks that can slow transactions down. Again, this might be a configuration option that the RDBMS itself offers, or is likely code you need to write yourself.
Are there any better known solutions here? Any pitfalls/caveats to this approach?