4

A bit of down time at work let my mind wonder a bit. I was designing a single location application on Digital Ocean (load balancer to app servers to DB with DNS). All fine. Then I thought what if the application will have a lot of traffic in America and Japan so I want to have data centers located in strategic geographic locations New York, California and Tokyo to speed up requests. All with the above setup syncing needed data. I get the wan connections.

What I do not get is how would typing in www.awesomesite.com get to one of those data centers without having to put something like a region in the url? I assume Facebook, Google and Amazon do something for this I am just having no luck figuring this out.

EDIT

There appear to be two ways of handling this.

  1. Letting DNS do what it does with "load balancing" aka Round-Robin DNS. Which has some pretty well documented faults but is a good cheap solution.

  2. The other is actual load balancing using something like NGIX or HAProxy. But documentation is sparse out there.

    1. This can be for disaster recovery?
    2. High Availability?

What I am trying to understand more accurately is option 2. More specifically

  1. Where would such proxies be located?
  2. Is there a single proxy or multiple?
    1. How is it still efficient if there is one global load balancer?
    2. How does DNS know which one to access if there are multiples (is it more round robin)?

This is my current thoughts. There A and B are the proxies for internet traffic and then a DMZ with a proxy for the clusters. This isn't anything real just from my head from some reading so please don't be too harsh...

enter image description here

nerdlyist
  • 945
  • 1
  • 7
  • 12
  • your edits completely changed the question. And, to be honest, at this point I have no idea what you're asking, but I'm pretty sure that my answer is no longer relevant, so have deleted it. – kdgregory Dec 28 '16 at 22:56
  • 1
    You have to distinguish intradatacenter traffic, which would be routed via HAProxy and the like and interdatacenter traffic, where you would announce different DNS records via BGP to the different providers. – Thomas Junk Dec 29 '16 at 07:30
  • @kdgregory I didn't think it did that much your answer was still valid in my opinion. It helped me realize that there were multiple ways to go about this. – nerdlyist Dec 29 '16 at 16:16
  • 1
    Your question reminded me of the [Akamai network](http://security.stackexchange.com/a/9672/11518 "what is the Akamai Name Server i see at some big companies"). – Mark Hurd Jan 04 '17 at 13:34
  • @ThomasJunk I think the Announcing of DNS records is where I am coming up short at. Could you elaborate on that a bit more? – nerdlyist Jan 04 '17 at 14:14

1 Answers1

3

As I wrote in my comment: You have to distinguish between traffic within a datacenter and interactions of users across the internet with (one) or more of your datacenters, or as I tried to coin it »intradatacenter« vs. »interdatacenter«.

The magic sauce for the internet comes from BGP. It is something like a word of mouth-protocol which routers speak. When you live in a certain area and ask your local ISP, where youtube lives, your ISP will know, because someone told him where to look for. Its routers knew, because other routers told them so - with interesting side-effects: How Pakistan knocked YouTube offline

BGP is used to distribute DNS-Information.

Because BGP is word of mouth, you could use Anycast to propagate the same IP address for different servers for different regions.

This makes it possible routing regional traffic to regional datacenters.

Thomas Junk
  • 9,405
  • 2
  • 22
  • 45