3

So what I know about socket is that a socket is an end point of a connection for a process, hence 1 socket on a host binds to an IP and a unique port number for each connection. enter image description here

But a webserver (by default use port 80) to listen for connections coming in from multiple clients.

My question is: Does that mean a single socket on the server is listening to multiple clients simultaneously? This would conflict with my understanding of socket

Could someone please shed some light on this topic?

td16
  • 167
  • 1
  • 4

2 Answers2

3

Sockets are file descriptors with special abilities. While every socket somehow uses a port, they are not the same thing.

A socket is identified by a local address+port and a remote address+port. That means the same local port can be part of multiple sockets if the remote part is different.

A TCP server (such as a web server process) listens on a local port. Here, the local address only controls who can connect to this port: everyone, or only connections from localhost. The remote address of a listening socket is zero, which means no connection. Here I've started a python3 -m http.server on localhost port 7001:

tcp  127.0.0.1:7001   0.0.0.0:*       LISTEN        32143/python3

When I connect to that web server via my web browser, we see two additional sockets:

tcp  127.0.0.1:7001   0.0.0.0:*        LISTEN       32143/python3
tcp  127.0.0.1:50204  127.0.0.1:7001   ESTABLISHED  1658/firefox
tcp  127.0.0.1:7001   127.0.0.1:50204  ESTABLISHED  32143/python3

(data obtained via netstat, and edited for clarity)

The Firefox browser created a socket to connect() to the server. Firefox uses port 50204 in this case, so its socket is identified as local 127.0.0.1:50204 remote 127.0.0.1:7001. When the server accept()ed the connection, this connection got its own socket, which is basically the reverse of the client socket: local 127.0.0.1:7001 remote 127.0.0.1:50204. The local port is the same port the server is listening to.

The client socket and server connection socket always mirror each other, although in reality the server often sees a different client IP+port due to network address translation (NAT).

Why can the server use the same port for all connections? Well, every TCP/IP packet contains the IP+port of the sender and receiver. When the server operating system gets a connection request from a client for some port, the connection will usually be refused unless a server process is listening on that port. In that case, the server process may accept the connection and we get a socket representing that connection.

For all subsequently received TCP packets, the OS will look at the addresses and see whether they match an established socket connection. If so, the packet content is stored in a buffer that can be read from the socket file descriptor by the server process. When the server writes to the socket connection file descriptor, the OS knows the local and remote address, and can therefore create a TCP packet with the appropriate metadata.

So the sockets are entries in a lookup table used by the OS to translate between file descriptors and network addresses/ports.

amon
  • 132,749
  • 27
  • 279
  • 375
2

In a nutshell, when a POSIX-compliant networking stack is in play, each connection to the socket on port 80 will result in its own socket being created when accept() is called.

So under the hood multiple sockets are created. It doesn't conflict with your understanding of a socket.

See here for more info: http://www.gnu.org/software/libc/manual/html_node/Accepting-Connections.html#Accepting-Connections

RibaldEddie
  • 3,168
  • 1
  • 15
  • 17
  • So the newly created socket would be bind to a unique port number designated by the server's OS right? for example, it will create a new socket binding to port number, say 3000 and will reply to the client with this socket – td16 Jun 29 '17 at 04:52
  • This is technically correct but slightly misleading. A new socket (= unique file descriptor) is created for each connection on the server (`accept()` system call). This socket is identified by the local port *and* the remote address/port. The connection socket will not open a new port for each connection, but will send/receive via the same port on the server. – amon Jun 29 '17 at 07:29
  • @amon yes this is clear from the documentation. I think for the OP it helps to be as succinct as possible while still remaining technically correct. – RibaldEddie Jun 29 '17 at 14:36
  • @td16 no, the server still responds over port 80, the port that the client originally connected to. Remember that there is a lot going on under the hood, beyond the web server process. The operating system's networking stack does a lot of work to juggle connections on port 80 and ensure that data is sent back to clients when it's ready. There's a lot of stuff to research here, to get a good understanding. You could look into event-driven programming, and blocking and non-blocking I/O. It may also help you to take a look at the OSI model of networks. Also check out W. Richard Stevens' books. – RibaldEddie Jun 29 '17 at 14:47
  • I think I understand now. So for the purpose of network programming, I can essentially assume that the webserver will reply on port 80 (or whichever port the client is connecting to). All the deeper details can be ignored for the purposes of network programming perspective i suppose – td16 Jul 04 '17 at 03:31