I want a trivial example of where MongoDB can scale but a relational database will have trouble

Question

I'm just learning to use MongoDB, and when discussing with other programmers would like a quick example of why NoSQL can be a good choice compared to a traditional RDBMS - however the scenarios I come up with and can find online seem pretty contrived.

E.g. a blog with lots of traffic could be represented relationally, but will require some performance tuning and joins across tables (assuming full denormalization is being used). Whereas MongoDB would allow direct retrieval from one collection to the same effect.

But the response I'm getting from other programmers is "why not just keep it relational and then add some trivial caching later?"

Does anybody have a less contrived example where MongoDB will really shine and a relational db will fall over much quicker? The smaller the project/system the better, because it leaves less room for disagreement.

Something along the lines of the complexity of the blog example would be really useful.

Thanks.

Is this limited to MongoDB or NoSQL in general? I would have some good example for Apache Lucene's faceted search which though no idea if this would apply to MongoDB as well. — thorsten müller, Oct 28 '13 at 17:41
NoSQL in general I suppose. If you already have some examples, I'd love to see them. — Ryan Weir, Oct 28 '13 at 18:04
See http://mongodb-is-web-scale.com/ for an interested (and slightly NSFW) explanation; fwiw you can make anything scale if you approach it right. — Wyatt Barnett, Oct 28 '13 at 22:22

Philipp · Accepted Answer · 2013-10-28T18:05:14.373

6

First, it scales well.

When a MongoDB database is too frequented or too big for a single server, you can easily add more servers by creating a cluster or replica-set of multiple shards. It scales almost linearly. This doesn't work nearly as well with most relational databases. Take a look at MySQL's list of limitations when working as a cluster, for example. Most of the entries in the list are no problem for MongoDB (or don't apply).

Second, it allows heterogeneous data.

Imagine, for example, the product database of a computer hardware store. What properties do products have? All products have a price and a vendor. But CPUs have a clock rate, hard drives and RAM chips have a capacity (and these capacities aren't comparable), monitors have a resolution and so on. How would you design this in a relational database? You would either create a very long productID-property-value table or you would create a very wide and sparse product table with every property you can imagine, but most of them being NULL for most products. Both solutions aren't really elegant. But MongoDB can solve this much better because it allows each document in a collection to have a different set of properties.

edited Oct 28 '13 at 18:05

answered Oct 28 '13 at 17:55

Philipp

23,166
6
61
67

5

'Second, it allows heterogeneous data.' Your example is perfect. Who hasn't had the horrible turn-the-db-into-a-key-value-store pattern emerge in such a system where entities have many possible attributes? Every programmer should be able to relate immediately. – Ryan Weir Oct 28 '13 at 18:20
5

MongoDB has some scaling issues as well. A cluster of more than 12 nodes can't use the default replica set replication mechanism. You have to fall back to the Master-slave setup. Master-slave replication has issues like no auto failover on loss of the master. Whereas Mysql can handle hundreds of nodes in a cluster. – stonemetal Oct 28 '13 at 20:04
1

I don't know that allowing heterogeneous data is a factor in MongoDB's ability to scale. Although I agree that this does simplify a lot of cases where you're using your database as a key/value store, that property alone doesn't help much in saying why MongoDB scales better than an RDBMS – dsw88 Oct 28 '13 at 21:23
@dsw88 scaling and heterogeneous data have nothing to do with each other. Which part of my answer lead you to believe that I wanted to imply that? – Philipp Oct 28 '13 at 21:50
2

Sorry, it wasn't anything in your answer itself. It's just that the title of this question is "I want a trivial example of where MongoDB can scale but a relational database will have trouble". It's doesn't seem like a general "When to use NoSQL over RDBMS" question; instead it seemed targeted exclusively to scaling capabilities of both database types. – dsw88 Oct 28 '13 at 23:12
2

@RyanWeir - agreed. When does a NoSQL database shine? When you realize you've just built a NoSQL database using a SQL RDB as the storage engine! – Carson63000 Oct 29 '13 at 00:27
+1 to "When does a NoSQL database shine? When you realize you've just built a NoSQL database using a SQL RDB as the storage engine!" – Morons Oct 29 '13 at 12:46
`You would either create a very long productID-property-value table or you would create a very wide and sparse product table with every property you can imagine` I'm not a DB expert but isn't the sane solution to add a product type (enum) column to the products table and add a table (with foreign key constraints) for each product type to hold the non-universal properties? – Doval Apr 21 '15 at 12:27
@Philipp You wouldn't need to add a new column for a new product type, only a new product type value and a new table for that product type's properties. You already have the column set up in the main products table. – Doval Apr 21 '15 at 13:29
@Doval That turns ugly when you want to add new product types. It will require to add a new database table which means it needs to be done by a database administrator and can't be done by a normal user. – Philipp Apr 21 '15 at 13:32
@Doval Sorry, I meant table, not column. Adding a new table is just as critical as adding a column and should be done by an administrator and not automatically. – Philipp Apr 21 '15 at 13:33
@Philipp Well, yes. But that comes with the territory. I imagine Amazon.com doesn't trust non-admins with changing what info certain products have. It sounds more like the use-case you're arguing for is user-defined data (in some format that's still usable by the database, not just a string or binary blob) rather than heterogeneous data. – Doval Apr 21 '15 at 13:36
@Doval I am pretty sure those people at Amazon who manage the product catalogues are not certified database administrators. They would be quite overqualified for such a simple job. – Philipp Apr 21 '15 at 13:39
*Second, heterogeneous data*: Postgres as a typical traditional RDBMS allows key value pairs to be stored in [hstore](http://www.postgresql.org/docs/9.0/static/hstore.html) or [json](http://www.postgresql.org/docs/9.4/static/datatype-json.html). They can be indexed as well. – Matthias Kuhn Oct 06 '15 at 13:16

score 3 · Answer 2 · answered Oct 28 '13 at 19:15

Some real world example of a problem I would have no idea how to solve in a reasonable way with SQL and an relational database alone (my fault maybe).

So we have a (common relational) database with about 30,000 products. Nothing big so far. Each of these products has many attributes. There are the common ones like group (cables, antennas, iphone cases... about 80), assortment (somehow similar to groups: car, hifi, mp3, only 15), brand (30).

Then comes the technical data. Each item has many of those like color, cable length, weight, volume. about 200 such value types and thousands of values.

And the most complicated: Many of those products belong either to some car type (or several of them) or some kind of mobile device. Those come in hierarchies in the form like: brand (apple) model (ipad) type (1,2,3,4) and in some cases generation. (for cars it's similar, though instead of generation we have build years)

Problem step One:

We want the amount of products for each of those attributes. How many are red? How many are in the cable group? And so on.

This could partially be solved with SQL. It would be a lot of queries and rather ugly but I think possible. Maybe slow but we could do it even more ugly and keep counters in each table and update at every change. Especially difficult with those attributes where a product can have multiple (like works with iPhone and 12 other mobile phones)

But here comes problem step Two:

When a customer selects one attribute (say he want only see products that are red) we want to update all those counters in real time. This means we would have either extremely complicated queries (not likely fast enough anyway) or keep counters for possible combinations of attributes (billions).

When I started on this project they had given the counter option a try and done this for a very small subset of attributes (group, assortment, brand). The code was ugly, buggy and slow. In addition they now had a table with counters that was far larger then the table of products.

Using Apache Solr's facets was actually the solution. Flatten the tables into a list of Documents (one per product) allowed to get all this data in real time with far simpler queries.

score 2 · Answer 3 · answered Oct 28 '13 at 20:47

You can think of anytime you think a EAV table is the best way to do things (notoriously slow in realtional datbases and hard to query), you might need a nosql database. This is especially true when you have no way of knowing in advance what the fields would be. An example would be storing the details of medical tests. Each new test might have entirely different data that you would need to store. And while you could (in theory) model existing tests (with a lot of time and effort as there are thousnads of them), how would you know what new tests you might get results from for tests (and maybe medical equipment) we haven't even invented yet.

This is a good reason even for something as simple as a contact manager. Everyone wants to track something different. It's no big deal as long as you know what column: Text14 is used for. — JeffO, Feb 11 '14 at 20:28

score 0 · Answer 4 · answered Oct 28 '13 at 17:05

0

The smaller the project/system the better, because it leaves less room for disagreement.

This is hard because NoSQL is only better in large environments. I take it that you mean a Simple Example, and I have a perfect one for you.

Suppose you are making a Travel website and you need to have you users travel from and of the 5,170 US airports destined for any of the other of the (same) 5,170 US airports...

But here is the Kicker, not all flights are direct, you need to tell the user all the stopover options as well, sometimes 2 or 3 stop overs. You also need to tell the user all the options over a 5 hour window! And you need to compute this in under 10 seconds while the user is waiting.

This is the Relational DB Nightmare... In comes NoSql, flight routes are usually set in stone a few weeks in advance, so you can calculate all the Gazillions of possible routs in advance store than in a simple NoSql DB cluster...

NoSql is the clear winner is such a scenario.

answered Oct 28 '13 at 17:05

Morons

14,674
4
37
73

Thanks, I love that example and will make use of it. But if what you're saying is true that 'NoSQL is only better in large environments' then I'm going to have to make a stronger case on the side of faster dev time, better future-proofing of scaling etc. Any other ideas? – Ryan Weir Oct 28 '13 at 17:10
4

@RyanWeir The answers to those questions will have to Application specific. TO be quite honest is sounds like you want to sell NoSql to the team because you want to learn NoSql. But that's a invalid reason, so you are trying to come up with something else. I would just tell them that, "Let's use NoSQL so we can learn it, it's a good skill to have". – Morons Oct 28 '13 at 17:17
1

Why is this a database problem in the first place? If I had to run calculations like this, I'd set it up as a variant on A* that doesn't stop after the first result. Pull all relevant flight data from the database (or have it already cached in memory), build a graph weighted according to the priorities the user has set, and report the first X number of results. – Mason Wheeler Oct 28 '13 at 17:18
@MasonWheeler not sure what you mean by "variant on A*" – Morons Oct 28 '13 at 17:20
@Morons: The [A* pathfinding algorithm,](http://en.wikipedia.org/wiki/A_Star) used for finding the shortest distance between two nodes on a graph. – Mason Wheeler Oct 28 '13 at 17:22
1

@RyanWeir: Morons is right, really. NoSQL is only better in large environments. Unless you're trying to build something on massive scale (ie. Facebook, Flickr, EBay, Amazon, etc) you almost certainly don't need it, and the tradeoffs in dev time become worth it once you get to moderate-to-large scale, which the relational model handles quite well on modern hardware. That's when you really start to appreciate the benefits and guarantees that ACID and the relational model bring. – Mason Wheeler Oct 28 '13 at 17:30
@MasonWheeler that may work, I've never done such a thing so i cant really comment on it. (But keep in mind that this is not a distance traveled thing) I actually spoke to an Travel booking Company (One of the largest in the world) This was the method they used. – Morons Oct 28 '13 at 17:31
@Morons: 'TO be quite honest is sounds like you want to sell NoSql to the team because you want to learn NoSql.' Haha you're right, but that doesn't really invalidate the question IMO. – Ryan Weir Oct 28 '13 at 18:23

I want a trivial example of where MongoDB can scale but a relational database will have trouble

4 Answers4