7

How is 5 nines calculated using lesser reliable links?

For example, we have 2 x 4 nines connections used in an active/active setup... which supposedly equates to 5 nines, but I don't buy it... how was a greater uptime achieved with lower availability links..

the two links are diverse (seperate exchanges) so I'm assuming their is some statistical/probability element to this, but what way is it calculated?

user10021657
  • 583
  • 2
  • 11
  • 3
    So you have two links, I presume *either* can be used for redundancy and they aren't *both* required for an operation. In that case being 'down' requires that both links are down, which is *less* likely. – Jason Goemaat May 22 '20 at 15:54
  • precisely.. although they are used on a primary/secondary basis, but yes either can be used. – user10021657 May 22 '20 at 16:07
  • 3
    You (and practically everyone else) *assumes* the links are diverse. They almost never truly are. (they could be in the same ditch or hanging on the same pole 10ft or 20mi away. It takes a great deal of effort to really have diverse links.) – Ricky May 22 '20 at 18:51
  • 1
    100% agree... I wasn't assuming they're diverse and have dealt first hand with digger through a fibre optic scenario with both paths gone... or a funny one where a pinch point was discovered in the ISP core when an exchange went on fire 100 miles awayand took out both routes... I was more wondering from a mathematical perspective where 2 x 4 nines links made up a better service (probably something I should already know tbh :-D) – user10021657 May 22 '20 at 19:34
  • Your best bet on getting the assumption to hold to make 5 nines out of 2x4 is to use a different ISP for each connection – slebetman May 23 '20 at 05:41

2 Answers2

15

4 nines = 99.99 %. That means the probability that a link fail is 0.01 % or 0.0001 in terms of probability (scale 0 - 1).

Assuming independence, The probability that both link fails is 0.0001 x 0.0001 = 10-8, which gives back 99.999999.

Yup, that's 8 nines and not 5, but we usually don't consider more than 5 nines.

Note that Assuming independence, is in fact a big assumption, see the excellent answer from @PeterGreen

JFL
  • 19,405
  • 1
  • 32
  • 64
  • makes sense now thank you.... just to expand on this, the original example was for a parallel setup.... say for a serial setup, where 1 link is five nines, but connected to a 4 nines network.... the overall reliability would be the weakest link, 4 nines, is this correct? – user10021657 May 22 '20 at 07:51
  • 1
    Yep a chain is as strong as the weakest element. – JFL May 22 '20 at 08:00
  • This is also why the notion of 'failure domain' is important. – JFL May 22 '20 at 08:01
  • Relying on a four nines and a five nines service simultaneously, is a whee bit worse than four nines alone (.9999 * .99999 = .99989). ;-) – Zac67 May 22 '20 at 10:48
  • 3
    Isn't it 4 nines = 0.9999, the probability of both links failing is 0.0001^2 which actually gives you 8 nines... – avakar May 22 '20 at 15:19
  • @avakar and MiloBrandt thanks, was in doubt while writing it, but was confident somebody would correct me in case.. ;) edited. – JFL May 22 '20 at 16:38
  • @JFL serial setup is not "weakest element" but "all must be working" - 0.9999^4 ~ 0.9996. – Alexei Levenkov May 22 '20 at 20:39
  • Of course, 99.9% of the time those "independent" links leave the building through the same conduit... – chrylis -cautiouslyoptimistic- May 23 '20 at 00:21
  • 1
    @chrylis-cautiouslyoptimistic- Which is OK, as long as you estimate the probability of the conduit being severed as less than 0.00001 :) – chepner May 23 '20 at 17:11
14

A link that is 99.99% reliable is down 0.01% or 0.0001 of the time. So if the downtime of the two links is independent then both lines will be simultaneously down 0.00000001 of the time. Your link is in-theory up 99.999999% of the time.

In practice though you don't usually get the full benefits because of other factors.

  • Do you know how independent the links really are? Is your communication provider relying on rerouting to achieve the 4 nines? If so, could that rerouting mean loss of diversity? Is there a common control/admin plane in the provider that if it fails could lead to both of your links being severed.
  • When a link fails how quickly does the network adapt? If there is one incident per year and it takes a minute for the network equipment to adapt then you are already down to less than 6 nines just based on the time for your equipment to adapt.
  • How reliable is the infrastructure on your own sites that bonds the two links together?
ilkkachu
  • 359
  • 1
  • 8
Peter Green
  • 12,935
  • 2
  • 20
  • 46
  • thanks for the response. I fully agree with everything you're saying in regards to it not to be taken as face value. I've worked with our assurance guys to factor in MTBF on equipment along with network availability to get a more truthful availability figure on setups.... I just never took much interest in the mathematics of the how it's calculated, until I was asked this week my original question (something I should probably know at this stage tbh :-)).... I gave 2 examples of issues I had before on "diverse" links, in the comments of the question. – user10021657 May 22 '20 at 19:53