2

Let's say I create my own website and design a discussion forum in that. Now, I have multiple types of users in my system, and millions of accounts per type of user. Since the website is very popular, 100s of new threads are created every second. People post on these threads (lets limit them to only text for now) rapidly so each thread gets 100+ posts every minute.

So, If I create a single table called thread, and add a new record everytime a new thread is created, and associate it with a posts table such that all posts belonging to that thread use it's ID as foreign key, I believe the database will bend down on it's knees in a matter of weeks.

Someone I asked told me to create a data warehouse and put old threads there and lock them. But what if I don't want to? Am I going wrong conceptually? I would like to know what experts say.

P.S. If you are going to delve into technical details, I am familiar with Ruby on Rails and Postgresql.

Abhi9
  • 129
  • 2
  • 4
    Between launching your website and getting to the situation described, you'll have ***plenty*** of time to figure it out and do some tests. Look at the database schema for some existing forum softwares for some ideas in the meantime. – whatsisname May 18 '16 at 23:03
  • 5
    [No startup has ever written the first version of their software with this kind of scalability in mind.](http://programmers.stackexchange.com/a/313653/1204) – Robert Harvey May 18 '16 at 23:15
  • I am not saying I will be writing my code to handle this level of scalability right now. I know I will figure it out in the **process**, I was just being curious. – Abhi9 May 18 '16 at 23:18
  • 6
    100s of new threads per second means that after 3 hours, there are on the order of 1 million threads. Every one of those threads gets 100+ replies per minute, which means 100 million replies per minute, or in other words, every single one of your users posts a new message every second, without ever taking a break, eating, drinking, peeing, sleeping, or going on vacation. After 1 year, every person on the planet is posting one message per second. Are you sure that's a realistic model? – Jörg W Mittag May 18 '16 at 23:34
  • @Jörg W Mittag I am not planning to implement this model, I am just asking how it is gonna work in terms of scalability using a hypothetical exaggerated scenario. – Abhi9 May 18 '16 at 23:40
  • 1
    [Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions](http://highscalability.com/blog/2011/12/22/architecting-massively-scalable-near-real-time-risk-analysis.html) – Robert Harvey May 18 '16 at 23:44
  • 1
    With that kind of load, you'll need to be more concerned with how to manage your data center infrastructure, either onsite or in the cloud, than with which language or DBMS you are using. You might find additional guidance on Serverfault.SE. – Pedro May 18 '16 at 23:46
  • @RobertHarvey Thanks... That explains the complexity of the problem at hand. Also, I'd like to ask, how many records can a modern database hold before it shows considerable degradation in performance. There is a subreddit where people just count. The moderators had to ask them to stop since the long comment chain was slowing down the system. Looking at the size and scale of reddit, I don't seem to understand how just one subreddit could slow down entire system. Was it concerned with number of requests rather than number of entries? – Abhi9 May 19 '16 at 00:21
  • 1
    The thing that you have to remember is that every company's architecture is *completely different.* So there are no easy answers to such questions. – Robert Harvey May 19 '16 at 00:44

1 Answers1

4

So, If I create a single table called thread, and add a new record everytime a new thread is created, and associate it with a posts table such that all posts belonging to that thread use it's ID as foreign key, I believe the database will bend down on it's knees in a matter of weeks.

I fail to see the database problem here.

100s of new threads are created every second ... each thread gets 100+ posts every minute.

Here's your problem. Your database will be fine. Your network, however, is experiencing a DDoS attack. Sure in a few days your storage will be at capacity but on day one people at your work site will be complaining that they can't watch kitten videos on their lunch breaks.

You need to limit access to your limited resources. Don't give users a blank check to post how ever much they feel like posting. The whole reason you force them to create accounts is so you can track what they're costing you. Don't let them cost you more than they give you.

As for scalable database design, keep it simple, decoupled, and replaceable. Some day it may need some fancy redesign. Until then just keep it simple so it doesn't take weeks for the redesign team to understand it. Decoupled so the rest of your code doesn't die with it. Replaceable so when the next big DB thing happens you can try it out without much pain.

candied_orange
  • 102,279
  • 24
  • 197
  • 315