Why can't sockets be used to identify individuals instead of cookies?

Question

Another question was asked regarding the use of IP addresses to identify individual clients. I think I understand why an IP address is insufficient. But what about the socket, which has more information and, from what I understand, is stateful? Couldn't that be potentially used in lieu of a cookie?

The socket is statefull, but http don't keep the connection open after you have downloaded the webpage. It closes about 15 seconds after the entire webpage is downloaded. this is the entire reason you need cookies to keep state — MTilsted, Mar 26 '17 at 22:09
A socket isn't a piece of data. You can't send it about. Your question doesn't make sense. — user207421, Mar 27 '17 at 01:14
It's like asking why you can't use verbs instead of nouns... — user541686, Mar 27 '17 at 03:00
@EJP: I answered under the assumption that the OP means the (source_ip, source_port, target_ip, target_port) quadruple instead of the socket object. But your interpretation makes sense, too. — Jörg W Mittag, Mar 27 '17 at 03:49
You could try and mimic the way firebase works to manage state or user identity without cookies or session. — JeffO, Mar 27 '17 at 14:27
Note that with NAT the socket seen by the server doesn't even belong to the client, but the NAT device. — pjc50, Mar 27 '17 at 19:19

Jörg W Mittag · Answer 1 · 2017-03-26T16:55:45.747

64

A socket identifies a connection. Cookies are usually used to identify a user. If I open two browser tabs to SE.SE, I will have two connections and thus two sockets. But I want my settings to persist across both of them. (In fact, typically, a browser opens multiple sockets for one page in order to speed up page load time; I believe most browsers have a default maximum value between 4 and 10 sockets per page.)

And the opposite can happen as well: if I close my browser tab, another user on the machine may open a browser tab to SE.SE, and may get the same quadruple of (source_ip, source_port, target_ip, target_port), in which case, he will get all my settings.

edited Mar 26 '17 at 16:55

answered Mar 26 '17 at 16:37

Jörg W Mittag

101,921
24
218
318

It's worth noting that with http2 (and http pipelining) that you probably won't have two sockets open to SE. Your browser will re-use the same socket. You'd need to have different browsers running. – Matthew Steeples Mar 27 '17 at 08:29
3

A trivial local check indicates that Chrome _does_ keep one socket open per tab - probably because they're sandboxed and it doesn't want to share state between them - but each tab's socket seems to be persistent and requests are probably pipelined within that tab. – Useless Mar 27 '17 at 09:55
1

It's also worth noting that you don't know what's going on at the back end either. Many load balancers will terminate a users connection and then open their own to the back end servers meaning that you may have multiple users sharing a single socket. – Dan Mar 27 '17 at 09:57
@Useless IIRC SE does use either WebSockets or long-polling (fallback) to fetch updates (new questions, edits, etc.) in case you're testing here. Other sites might behave differently - there's not much point keeping a socket open indefinitely on a static site. – Bob Mar 27 '17 at 15:22
True, it took me a few tries to find a genuinely-static site to check. For that, Chrome opens multiple sockets per tab, and still doesn't reuse them between tabs, but _does_ close them once the tab has finished loading. They're easily visible though because they spend so long in TIME_WAIT. – Useless Mar 27 '17 at 15:33
@Useless: What site did you use to check? I used a non-static site and my https connection was re-used across all 4 tabs that I was testing with according to TcpView. – Matthew Steeples Mar 27 '17 at 16:49

score 21 · Answer 2 · answered Mar 27 '17 at 04:00

21

TCP sockets are designed to be stateful so in general they are used to identify sessions. Protocols like SSH and ftp do exactly this.

HTTP is designed to be stateless and each connection is only associated with a resource to be downloaded. After a resource is downloaded the TCP socket that the HTTP request rides on is closed. The original reason for this was simplicity. But the side-effect is that HTTP servers running modern websites can handle far more users than socket based servers like SSH or ftp.

So sockets can't be used because HTTP will close the socket after downloading the web page.

Of course, saying HTTP will close the socket per resource is oversimplifying things because HTTP have features like pipelining and persistent connections that can download multiple resources per socket. But that's just optimisation. After everything have downloaded your browser will close the socket after some timeout.

HTTP was originally designed as a simple protocol for downloading HTML files. Old browsers can also download HTML files form other protocols like Gopher and ftp. As such, there was no reason for making HTTP stateful because HTML files are just simple text files.

Once web forms were introduced and HTML pages can send data back to the server web pages started to need sessions. Thus cookies were created to re-introduce state to a stateless protocol that is transmitted via a stateful transfer layer that is transmitted over a stateless network layer. So the full application layers are:

Ethernet, Wifi etc. = stateless
IP = stateless
TCP = stateful
HTTP = stateless
HTTP + cookies = stateful

These days we have websockets that can keep a single open socket from your web page to the server. So with websockets you can again use sockets to identify a user because the websocket by itself is stateful. But in most cases you will still need a cookie for the main html page that loads the javascript that starts the websocket.

answered Mar 27 '17 at 04:00

slebetman

1,394
9
9

5

"HTTP servers running modern websites can handle far more users than socket based servers like SSH or ftp" [citation needed] – el.pescado - нет войне Mar 27 '17 at 05:37
7

@el.pescado: It's just logical. Since socket based servers keep a live connection therefore socket based servers are limited to the maximum number of file descriptors you can open (and on some OSes sockets compete with hard disk for file descriptors). If you don't need to keep connection alive then your limit is just bandwidth. Note that the bandwidth limit is the same weather you keep a connection alive or not. Modern HTTP servers can handle millions of requests per second. If they need to keep those millions of sockets up while you read a webpage the server would die – slebetman Mar 27 '17 at 07:07
7

+1 for "Thus cookies were created to re-introduce state to a stateless protocol that is transmitted via a stateful transfer layer that is transmitted over a stateless network layer". That's beautiful – dirkk Mar 27 '17 at 13:03
I liked this answer. It cuts right to the core of the issue. – Jim W Mar 27 '17 at 15:31
@slebetman HTTP servers *are* socket-based. All of them are. By definition. And cookies do *not* make HTTP servers stateful. In fact by definition they are not, which is why the cookies are transferred by the client every time. – Miles Rout Mar 28 '17 at 21:44
@MilesRout: "stateful" in terms of networking means the ability of servers to identify what state the client is in (logged in, ordering something etc). By definition then cookies make HTTP stateful – slebetman Mar 29 '17 at 02:59
@MilesRout: https://en.wikipedia.org/wiki/Stateless_protocol – slebetman Mar 29 '17 at 03:04
@slebetman That's not what stateless means. Stateless is about *server* state, not *client* state. Of course the client has state. The advantage of a stateless protocol is that you can load-balance across many servers, as you don't have to keep connected to the same server, as you don't keep per-client state on the server (other than caching etc.) – Miles Rout Mar 29 '17 at 20:19
@MilesRout: That's not what stateless means. Remember, you can load balance TCP/IP across many TCP/IP "servers" (we call TCP/IP servers "routers"). Stateless means that each request to the server is independent and does not depend on past requests. Cookies make requests non-independent and can depend on past request (have logged in, have added stuff to shopping cart etc.) – slebetman Mar 30 '17 at 02:48
@slebetman Routers are not servers. You completely misunderstand the problem. You can send different requests in the same user session, in HTTP, to different servers, and that's fine, because every request carries with it the authentication parameters and other state. Cookes do **not** make requests non-independent unless you *choose* to use them to reference mutable server-side resources. They are **client side state**, and that doesn't make HTTP stateful. – Miles Rout Mar 30 '17 at 06:17
@MilesRout: As someone who once wrote code for routers I can assure you that routers are servers. The thing with HTTP servers are the same. HTTP servers are just routers for HTML pages. Without cookies you can load-balance HTTP servers just like you can load balance TCP/IP routing. But when you add cookie you can no longer fully load balance. The statelessness ends at the HTTP routers but in the end your database must by synchronised so it's no longer stateless. Without sessions you can fully load balance everything and give each HTTP router it's own database – slebetman Mar 30 '17 at 07:56
@MilesRout: Also, they are not ONLY client side state. They are server-side state as well. The server software MUST link client state to server state otherwise you cannot ensure that a logged in user will not access another user's account – slebetman Mar 30 '17 at 07:58
@MilesRout: Come on man, every definition of stateless and stateful networking defines HTTP+cookies as stateful. Why are you trying to redefine a word everyone agrees? – slebetman Mar 30 '17 at 08:00
@slebetman Because it is stateless, and everybody (except apparently you) knows it is stateless. – Miles Rout Mar 30 '17 at 12:28
@MilesRout: Apparently "everybody" except wikipedia, IBM (https://www.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_gws_stateful_services_030.htm), Mozilla Developers (https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies), Oracle (https://docs.oracle.com/javase/tutorial/networking/cookies/definition.html), IETF (https://tools.ietf.org/html/rfc6265), the RFC that specifies cookies (https://www.ietf.org/rfc/rfc2109.txt) etc. Look.. it's starting to look like everybody knows HTTP+cookies is stateful except apparently you – slebetman Mar 30 '17 at 12:38
@MilesRout: Icing on the cake is that the RFC that specifies cookies specifies it as a mechanism to turn stateless HTTP connections into a stateful session. The title of the RFS is not "Cookies" but "HTTP State Management Mechanism" – slebetman Mar 30 '17 at 12:39
@slebetman You're embarrassing yourself. – Miles Rout Mar 31 '17 at 00:41
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/56338/discussion-between-miles-rout-and-slebetman). – Miles Rout Mar 31 '17 at 03:00

Why can't sockets be used to identify individuals instead of cookies?

2 Answers2