Think select()
In order for a server to maintain that socket, a thread has to block and wait for IO on the server side
That's not true. The operating system can, and does, manage the network connection behind the scenes, and exposes access to the connection via syncronous and asynchronous behaviors. Userspace programs, which include web-servers, never directly communicate with the network hardware. Instead, programs invoke system calls to request the OS kernel to do it on their behalf.
A syncronous interface makes sense for a lot of applications, as it is simple, easy to reason about and debug, and sufficient for many typical networked application. In that scenario, the program asks the kernel for an update about the connection, and asks the kernel to suspend the program until there is an update available.
Instead of doing that, the OS also provides a way to ask the kernel "has anything happened on my connection?" If you have multiple connections, the kernel will respond with one of your connections that has data on it, and you can act on that data. You can therefore write a loop that repeatedly asks the kernel "what have you got for me?" and act on each one in sequence, all in a single thread. More sophisticated systems replace busy-wait polling with a special function that will block on the collection of outstanding network activities, and resume when any of them have an update. See the select()
function for an example of how such a system can be utilized.
Abstractions can be built on top of that "has anything happened" model, where the OS can signal the process that something has happened, and from there you can build async/await type abstractions and other models, all that run in a single thread of execution.
The operating system itself doesn't have a bunch of threads for each connection either. Using hardware interrupts, and other close-to-the-metal instructions, it reserves some time for it to manage organizing the networking abstractions, and it too manages each connection mostly one-at-a-time. After all, the electrical signals traveling down the wire are one at a time (major over-simplification).
Lastly, worth mentioning:
A modern web server has to handle at least hundreds if not thousands of requests per second
That's also not necessarily true. If you are writing facebook for cats, sure, however if I'm writing an online Pole-Barn ordering system, or something similar, as probably 99% of software developers are working on, I'm just not going to see that kind of traffic.