Why are websites (even this one) sometimes "Down for Maintenance"?

Question

I have personally never done this. I don't understand why so many sites do, if you do your development on a development server why would you ever need to shut down your production site?

I have always wondered about this.

What are they doing during this time, what requires doing this?

Keep in mind that the site probably *does* stay up for most updates. Obviously, you only see the ones where it actually *needs* to come offline for a while. — Dean Harding, Apr 26 '11 at 22:32
perhaps one of the things that made Admiral Grace Hopper popular was rediscovered? — setzamora, Apr 27 '11 at 14:05
No one addressed a security reason; there might be a known exploit (aka someone published how to exploit certain website) and the admins take it offline to mitigate abuse fromo other parties while fixing it. — Francisco Presencia, Nov 27 '16 at 08:51
It occurs to my to ask 'What strategies can I use to achieve zero (planned) downtime in a database-backed web app?' Specifically upgrades that require db schema changes: http://softwareengineering.stackexchange.com/questions/336945/what-strategies-can-i-use-to-achieve-zero-planned-downtime-in-a-database-backe — Stephen, Nov 27 '16 at 11:32

score 62 · Accepted Answer · edited Nov 27 '16 at 14:42

62

Big kicker for anything with big scale is that if one is changing database schemas in some way, one typically has some big, nasty maintenance scripts to run.

Now, these might take a second or so to run with your development dataset. But when you start measuring data in terabytes and petabytes, even adding a single column to a table can take hours.

So no matter how quick and automated the deployment is, you've still got data maintenance issues to get through. If you plan really well, you can put up a read-only mirror of the site while you are undergoing the process, but for many sites read-only is pointless and thus not worth the effort.

edited Nov 27 '16 at 14:42

sandstrom

103
4

answered Apr 26 '11 at 21:16

Wyatt Barnett

20,685
50
69

3

+1 - a read-only stack overflow wouldn't be much good. There's not going to be much you wouldn't be able to find on google :) – corsiKa Apr 27 '11 at 03:51
10

@glowcoder: When you search on Google, you find SO answers. – Donal Fellows Apr 27 '11 at 07:32
@Donal that was exactly my point. – corsiKa Apr 27 '11 at 16:28
2

Google is massive and sure to have a massive database; how come I never ever see "down for maintenance" for google? (Google.com homepage) – alexyorke Jun 12 '12 at 18:36
8

@alexy13 -- google is in a special category of scale where they can't have a single database or even datacenter, parts of the system are always down and they've written the front end to handle it. I would too if you handed me that kind of time and R&D budget. – Wyatt Barnett Jun 12 '12 at 20:42
You can do DB schema changes in multiple phases. First, add the column, with a "zero" value (whatever makes sense as a "not populated", possibly NULL). Then, use transitional code to populate it with expected values. Then, push code that updates the new column. Finally, push code that uses it for both read and write. The first may be a blocking operation, the rest aren't, but they do require coordination and discipline. – Vatine Nov 28 '16 at 10:03
@Vatine certainly that can work, as can a lot of other tricks like feature switches. That said, sometimes it is easier to just turn out the lights for a few hours than to build a complicated deployment process. Really depends on your app, your workload and your requirements. – Wyatt Barnett Nov 29 '16 at 22:16
@WyattBarnett Oh, definitely. If nothing else, it means getting the change out in "one go", rather than in four gos, probably taking weeks from start to end. At the end of the day, the question has to be "is it worth the time, complexity and effort". – Vatine Nov 30 '16 at 09:22

score 9 · Answer 2 · answered Apr 26 '11 at 21:06

9

There are a number of reasons why you might want to take a site down for maintenance. To name a few:

Database changes
DAL changes
Updating services

Basically, if your site isn't static, when doing a logic update you want to take it down otherwise people hitting your site may receive errors or unexpected behavior.

Also, if you will be touching the web.config (in ASP.NET) for your site, you should take it down for maintenance first as it will blow out the session for users. Thus, if they were in the middle of something, it would be lost.

answered Apr 26 '11 at 21:06

Tyanna

9,528
1
34
54

2

the session would be lost if using "In-Process" session state. If you use out of process session state the session will not be lost if the web.config is changed. – Anthony Apr 26 '11 at 22:18
2

The last point is only true if you're doing in-process sessions, which I hope you're not on a production site! There's more than just touching the web.config that'll take down the worker process. – Dean Harding Apr 26 '11 at 22:30

score 7 · Answer 3 · answered Apr 26 '11 at 21:06

Well this is somehow abstract question - I even seen sites which used "Down for Maintenance" instead of HTTP 500.

For web sites you sometimes need to do some upgrade. For example if you are changing database you don't want any other user to touch the database during that time. If database is offline the site must be gracefully turned off as well because showing SqlException is not very nice. Another reason is some HW failure or system failure (like leaking resources) which requires application or even system reboot.

Once I participated on upgrading of internet banking system in one of the biggest bank in my country. The whole process of upgrade web sites, middle tier and databases took three days where system was offline for customers. It also included full backup of everything so in case of failure the system could be reverted to the old version.

Isn't HTTP 503 (instead of 500) the correct status code for "down for maintenance"? — Nubok, Nov 27 '16 at 12:25

score 5 · Answer 4 · answered Apr 26 '11 at 21:28

Servers need patches to be run, and on many operating systems, those patches require reboots. So that is one category of down time. Many companies schedule reboots from patches for low use times, such as Sunday morning. If there are no patches, they reboot the servers anyway at the regularly scheduled maintenance time (this is a hangover from the NT4 days when certain counters overflowed every week and a half, so rebooting weekly prevented other bugs).

One company I worked for had an e-commerce site back in the late 90s that brought in more than $1,000,000 in sales per month. Someone promoted the wrong tax table to the production database server. The cure was to restore the db server from backup, and apply the transactions since the last backup. This took several hours, during which the website was unavailable to take orders. Since the orders portion and the static sales brochures were running on the same site and were inseparable, both had to come down.

One company I worked for had some wrong text inserted into the wrong place and the CEO flipped out and had the website taken off line "for maintenance" while the layout and text were "fixed" and the appropriate victim blamed and fired.

Even this can be mitigated with proper load balancing – the1dv Nov 28 '16 at 02:50 — the1dv, Nov 28 '16 at 02:50

score 5 · Answer 5 · answered Apr 26 '11 at 22:24

While other answers are correct, you can almost always avoid downtime using right architectures. But this has a cost, and this cost may not worth it: an hour of downtime costs amazon or the infrastructure behind NASDAQ a lot. Stackoverflow ? Most likely not so much.

How to avoid downtimes:

shutting down hardware serving pages: if you have proxies in front of your website, you can instead put them offline without any impact to the user
reconfiguring servers: same as above
updating/changing data in databases: you could put your website in read only mode, etc...

Generally, in a layered architecture, the closer to the "top" you are, the hardest it become to avoid downtime, same for stateful (webserver vs database).

Doesn't NASDAQ have about 14 hours a day of scheduled downtime? — Peter Taylor, Apr 27 '11 at 12:24

score 3 · Answer 6 · answered Apr 27 '11 at 04:55

3

A site may schedule regular downtime even if there's nothing to do every time the scheduled downtime comes around. By doing so, they get users used to the idea that the site will be down for a certain amount of time every so often so that when work does need to be done, users won't complain so much.

answered Apr 27 '11 at 04:55

Barry Brown

4,095
4
25
27

there's a cure for that: bring down the complaints system during downtime :) I've actually seen companies do that. An MMO company bringing down the website hosting the downtime announcement as well as the support forums together with the game being down for maintenance is a good example of that. Anyone who didn't catch the announcement during the few hours it was up before maintenance would never know what was going on. – jwenting Apr 27 '11 at 08:07

score 3 · Answer 7 · answered Apr 27 '11 at 14:01

There also is a psychological and marketing side to this. In some of the cases (I dare to say most of the cases but I'm not that bold *g*) reading "Down for maintenance" can also mean "The server has crashed or gone out of service for any other reason".

I've seen this quite frequently. Normally as a developer you'll want a "real" error messages saying something like "Whoops, we're experiencing a hight load right now and not all requests can be handled" but some people from marketing will tell you "dude, you cannot tell the customer that we're having a problem. Tell them that we're on scheduled maintenance - this will look a lot better".

So "Down for maintenance" often is just another term for "out of service".

score 2 · Answer 8 · answered Nov 27 '16 at 13:06

No server NEEDS to go down for maintenance. You can avoid doing so for anything, at any scale, DB change, server updates, etc.

The problem is that a 0-downtime system, at a certain scale, is very costly to create and maintain. You need redundancy everywhere, load balancing everywhere, data replication, synchronization. Those are hard problems.

Basically you need to arrive to the level of being able to release the Netflix Chaos Monkey in prod to be sure it works even if part of your system is busy with the update, or just out of sync. This is certainly doable. It's also very expensive, requires a lot of time and many experts to work on the problem.

Putting a site on maintenance mode can be a middle ground you choose, because you don't want to invest that much just to avoid taking down you site for a little time once in a while.

Economics.

Of course, if you do choose the road of 0down time, you site will gain more than just availability, it will gain reliability as well, since those best practices serve both purposes.

luis.espinal · Answer 9 · 2011-04-27T10:28:12.400

0

I don't understand why so many sites do, if you do your development on a development server why would you ever need to shut down your production site?

Shit happens. Unless you are doing some form of mathematical verification of your deliverables (and your specs are valid), no matter how careful you are, shit happens.

Also, there are times when you might have to do an change to a key piece of your infrastructure (say, a change to your database structures) that do require a down-time.

Unless you are developing a critical system (say a five-nine or six-nine system), the responsible and cost-effective thing to do is to build a system with the acceptance of down-times as part of reality.

Furthermore, you take that principle further by making down times manageable and amenable to scheduling (or at least detectable) with a clear understanding and procedure for effective recovery.

edited Apr 27 '11 at 10:28

answered Apr 26 '11 at 22:40

luis.espinal

2,560
1
20
17

1

Mathematical verification isn't a panacea either; sometimes you find that what you've verified is not what you *wanted* to verify. – Donal Fellows Apr 27 '11 at 07:36
True. But then I'd argue that the problem isn't with formal verification of specifications, but with the validation of those specs. If your specs are invalid, then obviously everything will fall apart from there, but validation of specs (*"are we really building the right thing needed by the intended user for the intended purpose"*), that's not the focus of verification (*"given these specs, are we building this thing right, or can it be built?"), informal or otherwise. I guess I should have put a caveat on that (wrt to validity of the specs.) – luis.espinal Apr 27 '11 at 10:27
I'm not arguing you're wrong to mention it. I just point out that there are limits to what it can do. I used to work on formal verification, and the big problem at the time was how to correctly evolve the *specifications* so as to take into account changing understanding of requirements. Since that's primarily a human problem, secondarily an engineering problem, and only tertiarily a mathematical problem, I don't imagine it's been solved fully yet. – Donal Fellows Apr 27 '11 at 15:37
Oh. I think then we are of like thinking. Changing requirements (and req. validation) are the Achilles' heels of formal methods. Since it is a creative task (due to its human nature), I don't believe it is solvable, not in the way *formalists/purists* would like it to be. I think that has been one of the failed promises of FM; they got oversold (I mean, for example, *formal methods for web development*?) The specs have to be highly scrutinized and not amenable to rapid change (and that's typical of critical systems, not highly malleable ones). The later are the norm rather than the exception. – luis.espinal Apr 28 '11 at 02:27
99% of user interfaces aren't to do with formal methods, but rather applied psychology. The remaining proofs are obvious (“don't deadlock the UI”) even if not always obvious to prove. But if you've separated the webapp according to best practices, then formal methods will make a lot of sense in the business methods layer (also in the data storage layer, but that's usually where the standard advice of “don't write your own DB” applies anyway. :-)) – Donal Fellows Apr 28 '11 at 14:19
I mean when I said *formal methods for web development*, I wasn't including UIs. What I had in mind was the back-end, business logic and web services. The cost and training of applying formal reasoning to ever changing business methods is something I don't see as justifiable to be honest. – luis.espinal Apr 28 '11 at 15:19
I think it's more justifiable than you might initially guess, but the majority of useful things to prove might turn out to be trivial. Which isn't a bad thing; it means that the knowledge of how to prove them could be embedded in the toolset/dev environment. – Donal Fellows Apr 29 '11 at 16:16

score -2 · Answer 10 · answered Nov 27 '16 at 14:29

-2

Once our website was hacked (old IIS6 and Windows 2003 server few years ago). while we were working on restoration we put "under maintenance" page for a few hours....

answered Nov 27 '16 at 14:29

serega

1

Why are websites (even this one) sometimes "Down for Maintenance"?

10 Answers10

Linked