3

As a personal project to learn key technologies, I am trying to build a spreadsheet on the cloud (something like Google Sheets).

How would you design the storage backend to power something like this? Does it make more sense to use a relational DB, something like a No-SQL-DB or a different kind of storage technology? And how would you map the different spreadsheet elements and operations to database elements / operations in an efficient manner?

Spreadsheets are very flexible in nature. One can add pretty much any kind of table to a spreadsheet. Imagine having a spreadsheet with thousands of rows and hundreds of columns, what kind of a database would be best suited to operations like:

  • adding multiple rows at once (kind of like inserts in a RDBMS)
  • add arbitrary columns (kind of like an add column operation on a column oriented database)
  • edit any cell value in the table
  • quickly perform operations on cell ranges (like sums, averages etc.) - perhaps this can be done on the client side (in the browser?)

If you ignore the front end challenges and think about the fact that the data needs to be stored somewhere, where would be a good place to store it? It goes without saying that when a user interacts with the spreadsheet from the browser, the interactions themselves need to be snappy.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
cydro
  • 49
  • 1
  • I wouldn't model the spreadsheet in a server-side database at all, and possibly serialize it as an XML blob. At that point the database choice is close to irrelevant – amon Aug 07 '18 at 12:31
  • Come on guys, 4 closing votes because of misinterpreting this question as a 3rd party resource request? This is clearly an architectural question, not asking for a particular DB vendor, just for a suitable storage or database **type**. Of course, it is a little bit broad, but IMHO still on-topic. – Doc Brown Aug 07 '18 at 13:04
  • @amon: this can be a sensible approach, however, it makes it very hard to edit one spreadsheet concurrently by many users. I think it is important to know about this particular requirement. – Doc Brown Aug 07 '18 at 13:06
  • 1
    @DocBrown: It's still asking for an external resource, not asking how to do something yourself. You're distinguishing between "what brand of car should I buy?" (BMW, Mercedes, ...) and "what type of car should I buy?" (pickup truck, sedan, coupe, ...) but both questions violate the rule for the same reason. It is not a substantive question, it simply asks "please fit my needs to an existing product". – Flater Aug 07 '18 at 13:19
  • 1
    @Flater: I think we should look primarily for ways how to improve questions so they don't get closed, not for formal reasons to close them, even if the gist of the question is fine. FWIW, I tried to improve the question the way I interpreted it. Better now? – Doc Brown Aug 07 '18 at 13:45
  • @DocBrown: The gist of the question _isn't_ fine. Your edit didn't change that fact. The rules mention _"finding or recommending products or services, including tools, libraries or packages, **programming languages**, books, scholarly papers, tutorials, articles, or blogs"_ asking which DBMS to use, even if you argue that it's not a particular _brand_ of DMBS, it's the same as asking which programming language to use. – Flater Aug 07 '18 at 13:52
  • @DocBrown: From the link in the help center: _"Questions asking us to find or recommend tools, libraries, programming languages, resources (including books, blogs, tutorials, and examples), or projects to undertake are off-topic here **as they attract opinionated answers that won't have lasting value to others.**"_ It also specifically mentions the pointlessness of questions whose answers will consist of `Try libXYZ.`. Whether it's a type of DBMS or a brand of DBMS really doesn't matter here; it will attract the same type of answer. – Flater Aug 07 '18 at 13:54
  • @Flater: ok, I forgot to change the question title to make it clear it is a design question. Feel free to make a constructive suggestion to improve the question further. – Doc Brown Aug 07 '18 at 14:28

1 Answers1

2

Instead of trying to figure out what technology to use, focus on the data itself.

Maybe you can try to model how you would efficiently communicate the user changes to the backend, and propagate them to other users.

Try to depend on as little technology as possible, and solve the interesting problems first.

For example: I think a list of populated cells would suffice, to start with. Store just their row number, column number and the cell contents. All your operations can than be implemented as updates to one or more cells.

  • 1
    This. Just because a spreadsheet has columns and rows, and a SQL database table also has columns and rows, you don't have to model one with the other – Caleth Aug 07 '18 at 15:34