110

I am currently working on a Ruby on Rails project which shows a list of images.

A must-have for this project is that it shows new posts in realtime without the need of refreshing the web page. After searching for a while, I've stumbled upon some JavaScript solutions and services such as PubNub; however, none of the provided solutions made sense at all.

In the JavaScript solution (polling) the following happens:

  • User 1 views the list of photos.
  • In the background the JavaScript code is polling an endpoint every second to see if there is a new post.
  • User 2 adds a new photo.
  • There is a delay of 50 ms before the new cycle is triggered and fetches the new data.
  • The new content is loaded in the DOM.

This seems odd when translated to a real world example:

  • User 1 holds a pile of pictures on his/her desk.
  • He/she walks to the photographer every second and asks if he has a new one.
  • The photographer makes a new photo.
  • This second when he/she walks in, she can take the picture and put it on the pile.

In my opinion the solution should be as following:

  • User 1 holds a pile of pictures on his/her desk.
  • The photographer takes a new picture.
  • The photographer walks to the pile and puts it with the rest.

The PubNub solution is basically the same, however this time there is an intern walking between the parties to share the data.

Needless to say, both solutions are very energy consuming as they are triggered even when there is no data to load.

As far as my knowledge goes there is no (logic) explanation why this way of implementation is used in almost every realtime application.

Whymarrh
  • 159
  • 2
  • 12
dennis
  • 1,169
  • 3
  • 9
  • 8
  • 9
    And what alternatives can you imagine? How would they work? – Telastyn Jul 22 '14 at 18:45
  • 197
    Ignoring for a moment that web browsers are not servers that can receive incoming connections... wait, no, lets not ignore that. – GrandmasterB Jul 22 '14 at 18:45
  • 2
    I am not ignoring that fact, I am simply curious why that is and how I would may provide a better and logic solution to that. – dennis Jul 22 '14 at 18:47
  • 17
    @dennis: a stateful, persistent connection between the server and client would probably get rid of the need for polling, but that is not how the Web was designed. – FrustratedWithFormsDesigner Jul 22 '14 at 18:49
  • 3
    @dennis given that a browser can't accept an incoming request, how then would a server initiate such a connection? The connection has to be initiated by the browser. – GrandmasterB Jul 22 '14 at 18:50
  • 1
    Thank you all for the comments, I see that the browser in this example is not suited for incoming requests however this example does not only occur in web based apps. also in sensor logics which are constantly checking its state. for example a light that turns on when it's dark (not by resistance but by software polling) – dennis Jul 22 '14 at 18:55
  • 4
    Take a look at functional reactive programming. Generally speaking, it addresses most of your questions. – Steven Evers Jul 22 '14 at 19:04
  • 59
    How about Websockets? – I.devries Jul 22 '14 at 20:28
  • 25
    Or take a look at long polling. Basically you poll, but the server doesn't respond before it has any new data to show you. – Matsemann Jul 22 '14 at 20:51
  • 4
    I'm reading between the lines "surely there's a better way? anyone know any?" so the answers about websockets are obviously relevant and very useful. but someone said they aren't, so maybe it would help to add those words explicitly, for such literal-minded people. (I'm refraining from actually making such an edit, in case you really didn't want it) – Don Hatch Jul 22 '14 at 23:13
  • You can use a keepalive connection with http, but that takes up limited ports. If you are running an app that requires constant data transmission (games) then that is the way to go, but if you can write your app to _not_ require a constant connection, then each server (theoretically) handle more clients... at least not limited by ports. – technosaurus Jul 22 '14 at 23:21
  • 4
    @DonHatch For web programming? Websockets are relatively new. Polling was the _only_ way for a long time, so most resources online will point at polling with javascript just because there's been more time for it to accumulate. With web programming, polling is also still simpler to implement.. – Izkata Jul 23 '14 at 00:19
  • 53
    There are many perfectly sensible solutions and algorithms in computer-space that would be completely absurd to do in meatspace. – whatsisname Jul 23 '14 at 02:48
  • 3
    @whatsisname: Indeed, sorting is one good example. In meatspace all the sorting algorithms are pointlessly complicated because "I can obviously see with my eyes which number should be first!" – slebetman Jul 23 '14 at 04:21
  • 10
    @slebetman Yup trivial to quickly sort even four numbers 7829349012837492, 7829349102837492, 7829346012837492, 989346012837492. – Joshua Taylor Jul 23 '14 at 15:14
  • 1
    Typically if you cannot use Websockets you would use what is called a slow/long-poll. This is a poll, but the request will not return immediatelly but after a timeout. Typically smaller as the typical HTTP timeouts (<2mins). This reduces the number of requests AND it speeds up responsiveness, as the parked request returns immediatelly when the event is happening. In fact it is so immediate that it can be used as a "push" from higher level Java APIs (callback). – eckes Jul 25 '14 at 01:32
  • 2
    @dennis For sensor logic, often the power concerns dominate - a polling system can be sleeping 99% of the time, waking up every x minutes to quickly pull updates; a push wouldn't work with such timing since most of the clients would have their network chips powered down most of the time. Actively listening 24/7 takes a lot of battery power. – Peteris Jul 25 '14 at 07:56
  • 2
    Polling is easy and sometimes it is "good enough." – Tony Ennis Jul 26 '14 at 12:55
  • This question just got a signal boost from Arstechnica: http://arstechnica.com/information-technology/2014/11/why-is-polling-accepted-in-web-development/ – user16764 Nov 24 '14 at 01:59

8 Answers8

183

Pushing works well for 1, or a limited number of users.

Now change the scenario with one photographer and 1000 users that all want a copy of the picture. The photographer will have to walk to 1000 piles. Some of them might be in locked office, or spread all over the floor. Or their user on vacation, and not interested in new pictures at the moment.

The photographer would be busy walking all the time and not take new pictures.

Fundamentally: a pull/poll model scales better to lots of unreliable readers with loose realtime requirements (if a picture takes 10 seconds later to arrive on a pile, what's the big deal).

That said, a push model is still better in a lot of situations. If you need low latency (you need that new photo 5s after it's taken), or updates are rare and requests frequent and predictable (keep asking the photographer every 10 seconds when he generates a new picture a day), then pulling is inappropriate. It depends on what you're trying to do. NASDAQ: push. Weather service: pull. Wedding photographer: probably pull. News photo agency: probably push.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
ptyx
  • 5,851
  • 2
  • 22
  • 21
  • 33
    I really like your analogy with 1000 users, some on vacation, some not interested. +1. – riwalk Jul 22 '14 at 18:59
  • 3
    Thank you ptyx, this was exactly the answer I was looking for very well put. – dennis Jul 22 '14 at 19:02
  • 1
    @dennis: There is also a hard limit. All OSes use integers to identify sockets. Therefore all OSes have maximum number of open sockets (on some OSes this number is tuneable but not infinite). A service with long open sockets cannot scale beyond this maximum. Therefore if you are a small blog this strategy makes sense. But if you're Google or Amazon you'd quickly exceed maximum number of open sockets. Therefore you poll - close the open sockets often/immediately so that others can establish a connection to your server. – slebetman Jul 23 '14 at 04:26
  • 1
    +1 but: as far as i know, pushing for mobile phones works by longpolling. So Apple/Google are able to bypass that limit - i guess by using tons of servers – Flo Jul 23 '14 at 07:40
  • @slebetman you are almost correct, but not entirely since most popular server OS today allow for more than one IP for a physical adapter. So by using additional IPs this hard limit is a little softer. – Esben Skov Pedersen Jul 23 '14 at 13:49
  • 4
    @EsbenSkovPedersen: Socket limit is not due to IP address. It's due to maximum open file descriptor. So the maximum number of open socket is independent of how many IP addresses you use. – slebetman Jul 23 '14 at 14:11
  • @EsbenSkovPedersen: Even for a single IP address the maximum number of allowable TCP connections is dictated by source_port+destination_port+session_id. This usually exceeds the maximum number of sockets you can open because the OS will run out of file descriptors long before it runs out of all combinations of TCP session identifiers. – slebetman Jul 23 '14 at 14:14
  • @slebetman I think you misunderstand. Using multiple IPs with multiple machines sidesteps the limit. (And anyway, StackOverflow does use websockets for updating vote counts (and, I think, notifications) while on a page, so they _have_ sidestepped the limit) – Izkata Jul 23 '14 at 14:16
  • 10
    This is a horrible analogy to put it mildly. In order for the push to work, any user's client must maintain an open connection of some sort. In fact, polling is an emulation of a connection. It's not like because *some* clients are polling, that all clients are notified. Similarly, when *some* clients open a connection for push notifications, not all clients are notified. This is very poor advise that invites throwing resources out the window. Being bombarded with 10000 requests per second is *virtually never* cheaper or otherwise better than maintaining 10000 open sockets. – back2dos Jul 23 '14 at 15:00
  • 3
    @back2dos it's a balancing act. 10k requests per second is actually pretty easy to handle (and caches/proxies can help a lot there). With few clients (let's say <10k) and a tight maximum lag (let's say <1s) maintaining the connections open might be better (websockets). But most web apps have much lower expectations (10s is perfectly ok) and need to handle huge spikes (100k - 1M clients) - and it's easier to scale pull than push. Don't forget to account for development cost as well - web standards and infrastructure are mostly designed for pull and that makes development much faster/safer. – ptyx Jul 23 '14 at 15:39
  • 8
    @ptyx: The 1s interval is the one being discussed here. 10k requests per second means 10k TCP handshakes and 10k HTTP requests (each easily reaching 2KB), which gives you multiple orders of magnitude more background noise pounding your server. There is a variety of battle tested libraries that make push subscriptions as easy as putting polling in place. There are even frameworks like meteor.js that completely abstract the whole issue away. Appealing to scalability without any further explanation is also hardly an argument. Anyway, I have voiced my doubts and don't wish to start a discussion ;) – back2dos Jul 23 '14 at 16:48
  • 5
    I agree with back2dos's comment above. If pull scaled better than push, google, stack exchange, facebook, online stock services, etc. would use pull technology. But they don't. Fundamentally, hammering the server instead of setting up a listening station scales terribly. Major services avoid polling. – Travis J Jul 23 '14 at 23:02
  • 3
    -1 for abject denial of zillions of frameworks that solve this allegedly insurmountable problem. – djechlin Jul 24 '14 at 04:25
  • 3
    This just isn't true. Push almost always scales better than pull because push doesn't require any overhead - it only sends data when there is actually data to send. Polling multiplies the total number of active clients by the polling interval, which is far more overhead. The cost of those "open connections" in a WebSocket/SSE implementation is practically nil, and certainly far less than the cost of all those repeated polls. The real reason polling is still done is because people are unable to use push technologies - or, the cynical answer, because they're too lazy/paranoid to learn/use them. – Aaronaught Jul 25 '14 at 16:16
  • 1
    Check this link: http://blog.fogcreek.com/the-trello-tech-stack/ it discusses how trello uses websockets with fallback to ajax polling. – Hoffmann Jul 25 '14 at 16:25
  • 1
    @Aaronaught: push does have overhead. To keep a socket alive, Websocket have to send keepalive/ping-pong frames every now and then. This is essentially the same as hammering the server on pull based system, except that you can't configure ping-pong interval in WebSocket. If your application don't need very low latency update, your server will have to service a larger number of pingpongs than is needed by low frequency polling. Just because you can't see ping pong frames doesn't mean that it doesn't exists. – Lie Ryan Jul 25 '14 at 23:57
  • @Aaronaught: Also a pull system can easily be cached by intermediate caching proxies so the request may not even need to bother your server at all. OTOH, caching proxies in push system is not as straightforward, so all requests will need to reach the server. In mobile devices, push forces the device to wake up and service the message, even when it doesn't need the new message, often using more battery than if the device just pull at times when it's already awake. – Lie Ryan Jul 26 '14 at 00:21
  • This answer ignores the possibility to `push` a small note like "there's something new!" and then have (interested) clients `pull` the actual data. – Raphael Jul 26 '14 at 11:15
  • 1
    @LieRyan: Those keepalives are not even *close* to the overhead of a polling system. We're talking about a few bytes compared to all the overhead of an HTTP request, and a much smaller interval (like once per minute). Polling obviously *can't* be cached without compromising the data integrity - if your polling interval is set shorter then the cache timeout, then you've designed your system wrong. Honestly, at this point you're just grasping at straws. – Aaronaught Jul 26 '14 at 13:58
  • 1
    @Aaronaught: you do realize that a lot services don't need to poll than once every, say, fifteen minutes. Even once every day or so is acceptable for many systems. HTTP specifies a number of headers for caching, expiry, max age, and also how cache can be done by intermediate proxies. To able to handle caching efficiently is the reason why HTTP is designed the way it is, rather than have everyone send data using raw socket. For being so opinionated, your lack of knowledge of HTTP is appalling. – Lie Ryan Jul 26 '14 at 15:15
  • 1
    @LieRyan: You're talking about how HTTP was designed, but HTTP was designed around a *document* model and not an application model. In fact, most messaging frameworks that support fallback to Ajax long-polling will include some cache-busting strategy to *prevent* the exact scenario that you're talking about (Internet Explorer, for example, is notorious for caching things that shouldn't be cached). I can't really imagine what you've based my perceived "lack of knowledge of HTTP" on since I've hardly said anything about HTTP - again, you're grasping at straws. – Aaronaught Jul 26 '14 at 15:52
  • 1
    I also don't think I've *ever* seen a web application polling every 15 minutes, and certainly not as infrequent as once a day. It's a tall order to assume that a user would even stay on the same web page for so long, and certainly [contrary to all the UX research](http://www.nngroup.com/articles/how-long-do-users-stay-on-web-pages/). Even if this were a realistic use case, Ajax long-polling is *still* more efficient than interval polling! – Aaronaught Jul 26 '14 at 15:55
  • 1
    @Aaronaught: there are much more types of web applications than you are thinking of. News feeds are not time sensitive and can tolerate delay and can be aggresively cached, even for feeds that are updated fairly frequently, say every few hours or daily, pull-based news reader can easily scale to a million subscribers with a single machine. Also consumers of the web is not necessarily a browser, many Web APIs for example are consumed by native applications or other servers. You're not seeing the bigger picture. – Lie Ryan Jul 26 '14 at 22:52
  • 1
    @LieRyan: Caching has *nothing* to do with it. It's a non-sequitur. Regardless if you've set up your headers to cache at your own server, a proxy server, or the client, you're just wasting resources if your cache timeout is different from the poll interval. And it's irrelevant because it costs less to maintain a websocket for 3 hours than it does to have the client poll every 10 minutes and get a 304 back. Native Android/iOS apps also support websockets so that's not an argument either. Maybe the scale argument was valid 5 years ago but it isn't anymore. Deal with it. – Aaronaught Jul 26 '14 at 23:28
  • Besides which, trying to run an application with millions of users on a single machine is just a terrible engineering practice, whether or not it manages to scale up to the load. What's your availability SLA? What's your business continuity plan? I don't want to be the one maintaining that system and getting paged in the middle of the night because the one server that powers *everything* died, or because something just happened to slow down or crash and not come back up. That's the "big picture" - not some nonsense about caching and a contrived example about news readers (we all know RSS). – Aaronaught Jul 26 '14 at 23:32
  • 1
    @Aaronaught: and yet, all those questions you posed is worse when you have to maintain a websocket systems. Load balancing and transparently replacing a faulty pull-based system is much easier than transferring an active socket connection between machines. You seem to be over-exaggerating the overhead of HTTP request/response, a minimal HTTP 1.0 request can be as small as one line of text (GET /resource HTTP/1.0), and the minimum response can be just a single line (HTTP/1.0 304 Not Modified). Websocket can probably reduce that to a single byte but you're microoptimizing at that level. – Lie Ryan Jul 27 '14 at 00:35
  • 1
    @Aaronaught: There are valid reasons to set different cache timeouts than polling rate, if your neighbour have requested the same resource recently, your request for the same resource can be served from a caching proxy setup by your ISP. Websocket is great for many applications that requires low-latency like chat programs or collaborative document editor, which was previously impossible to do well. But it's not a panacea, for many applications using push/websockets unnecessarily adds complexity and can limit scalability. – Lie Ryan Jul 27 '14 at 01:22
  • This string of comments is rapidly becoming massive. Please take this to [chat]. –  Jul 27 '14 at 02:36
107

I'm really surprised that only one person has mentioned WebSockets. Support is implemented in basically every major browser.

In fact PubNub uses them. For your application the browser would probably subscribe to a socket that would broadcast whenever a new photo is available. The socket wouldn't send the photo, mind you, but just a link so the browser could download it asynchronously.

In your example imagine something like:

  1. User(s) lets photographer know that he wants to know about all future photos
  2. Photographer says over loudspeaker that a new photo is available
  3. User asks photographer for photo

This is somewhat like your original example solution. It's more efficient than polling because the client doesn't have to send any data to the server (except maybe heartbeats.)

Also, as others have mentioned, there are other methods that are better than simple polling that work in older browsers (longpolling, et al.)

korylprince
  • 1,029
  • 1
  • 6
  • 4
  • 4
    WebSockets wasn't mentioned because that's not what the question is about. – Robert Harvey Jul 22 '14 at 22:35
  • 43
    @RobertHarvey how come WebSockets are not related to the question? The question asks whether polling is an acceptable strategy, and nowadays it clearly isn't acceptable (or not optimal at least). WebSockets, Server-sent events and long polling perform much better on virtually every single use case. – Fabrício Matté Jul 22 '14 at 22:58
  • 1
    @FabrícioMatté: You reframed both the question and my reply. – Robert Harvey Jul 22 '14 at 23:00
  • 7
    @RobertHarvey that was just my interpretation, no reframing as far as I can see. Sure, the question asked *why is it still accepted* and not *what is the optimal strategy*, but these are still tightly related imho. – Fabrício Matté Jul 22 '14 at 23:01
  • 5
    seemed obvious to me that this is relevant, although it wasn't asked explicitly. I added a suggestion to the original question to be more explicit. – Don Hatch Jul 22 '14 at 23:15
  • 25
    WebSockets (and the like) are the closest you can get to implementing the OP's "solution", so I think it's very relevant despite him not mentioning it specifically. – korylprince Jul 23 '14 at 02:48
  • 6
    Not to mention, `StackExchange` sites like the one you're on right now (unless you're looking at this webpage cached/saved) use `WebSockets`. This was why I was also wondering why no one until @korylprince mentioned `WebSockets`. – trysis Jul 23 '14 at 05:10
  • 6
    @FabrícioMatté: actually, not every single use cases. Long polling requires keeping a socket open for every users which takes up system resources. Fr services that isn't very time critical but have lots of users, keeping a socket open is usually more expensive than servicing a short 304 every now and then. For most services, a slight delay is not an issue. A single machine can usually serve more clients with polling than with push. – Lie Ryan Jul 23 '14 at 13:22
  • 1
    This doesn't answer the question ... – svidgen Jul 23 '14 at 13:51
  • I tried to keep it simple. But I agree, websockets are nice because they let you do a push implementation without requiring the server to maintain and register connections to clients. – ptyx Jul 23 '14 at 18:18
  • @LieRyan interesting, I still haven't run into issues with an excess of open sockets, but it is indeed a point to consider. Thanks for the info. `=]` – Fabrício Matté Jul 23 '14 at 20:05
  • 3
    Longpolling is actually a perfect solution, without any drawbacks. The browser polls immediately after receiving an answer or after a timeout and the server only answers when there is new data. You have low traffic, immediate reaction time, almost no programming overhead or state-keeping on the server, no lost-updates and Underlying, WebServices also have to send keep-alive signals regularly, because that is how TCP/IP works... So EVERY solution with an active connection is actually low-level polling ;-) – Falco Jul 24 '14 at 08:59
  • Longpolling is not without drawbacks. Some IaaS services only permit one simultaneous connection per node. To scale to n users with longpolling, you'd need n nodes. – corsiKa Jul 24 '14 at 18:18
  • 2
    @RobertHarvey Perhaps the most pedantically accurate answer is: "Polling is acceptable because people don't know how how to do WebSockets" – Calvin Fisher Jul 24 '14 at 18:24
  • 2
    @LieRyan: That's a statement often made and never proven. On a competent web server implementation, keeping a socket open costs almost nothing if nothing is actually crossing the socket, and is far cheaper than even a 204 or 304 response. The only "resource" excuse you could reasonably have is if you're running only a single web server and are literally running out of memory or available connections - but if that's the case, the site won't scale regardless. Nearly every high-scale technology is based on events or interrupts (message pumps, buses/brokers, I/O completion ports...) – Aaronaught Jul 25 '14 at 16:24
42

Sometimes good enough is good enough.

Of all the possible ways to implement a "real-time" communications process, polling is perhaps the simplest way. Polling can be used effectively when the polling interval is relatively long (i.e. seconds, minutes or hours rather than instantaneous), and the clock cycles consumed by checking the connection or resource don't really matter.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
31

The HTTP protocol is limited in that the client MUST be the one to initiate the request. The server cannot communicate with the client unless responding to a client's request.

So to adjust your real world example, add the following restraint:

  • User 2 can ONLY respond to User 1's questions with a single sentence reply, after which User 1 must leave. User 2 has no other way of communicating.

With this new restraint, how would you do it other than polling?

riwalk
  • 7,660
  • 3
  • 29
  • 32
  • Thank you for your answer, this made it very clear for web based applications however it is also seen in softare where polling a sensors for its state using a while loop is pretty normal and I was wondering if there would be a better solution. – dennis Jul 22 '14 at 18:59
  • @dennis, ptyx's response gives some very good reasons (and puts it in terms of your analogy too). – riwalk Jul 22 '14 at 19:00
  • 6
    HTTP 2.0 will support server pushes. "Pushing allows servers to send representations to clients without an explicit request being made." http://en.wikipedia.org/wiki/HTTP_2.0 – kaptan Jul 22 '14 at 19:28
  • 5
    @kaptan, that's great, but its not available. Make do with what you've got. – riwalk Jul 22 '14 at 20:36
  • 7
    There is also long-polling which is available right now and simulates a push model using a pull. – Tim B Jul 22 '14 at 21:02
  • @Stargazer712, well SPDY http://en.wikipedia.org/wiki/SPDY#Client_.28browser.29_support_and_usage is supported by major browsers now :) – kaptan Jul 22 '14 at 21:12
  • @kaptan, it seems (in Chrome at least) that you have to turn it on. Also, my crystal ball is telling me that firewalls are going to have a heyday with that feature before everything is said and done. – riwalk Jul 22 '14 at 21:48
  • 24
    @dennis: Having written industrial automation software I'd just like to comment on your polling of sensors example. Polling sensors serves two purposes - the most obvious is to fetch new data. The less obvious is to detect that the sensor is still alive, not crashed due to a bug or burning due to factory fire or melted due to industrial accident. Silence, the fact that you receive no reply, is also valuable data. – slebetman Jul 23 '14 at 04:16
  • 3
    @dennis Sensors often sense much faster than you're interested in the data. Polling allows you to get the sensor value exactly when you want it, without being flooded with updates you don't care about. (Imagine if the OS notified your application every time a file changed anywhere on the disk, instead of your application needing to open and read the file) – user253751 Jul 23 '14 at 11:04
  • 1
    @immibis: bad analogy. it is impossible for a local filesystem to change without the OSes knowledge. If a harddrive autonomously change its data, you need to buy a new harddrive. – Lie Ryan Jul 23 '14 at 13:35
  • 2
    @LieRyan: That's not what immibis said. He said "your application". Not the OS. In this analogy the OS FS driver is a sensor. – slebetman Jul 23 '14 at 17:06
  • 2
    @kaptan: HTTP 2 server push doesn't apply to this situation. Server push allows the server to start sending a *file* to a browser before the browser knows it needs it. If the server pushes a file that the browser would never otherwise request, the browser just ignores the data. In other words, a page cannot set up an event handler to receive pushed data -- that's what web sockets are for. – josh3736 Jul 23 '14 at 17:46
  • @josh3736, i kinda agree that the PUSH mechanism is intended more towards "static resources" but technically you can PUSH "dynamic data". For instance as you can see in the example here http://japhr.blogspot.ca/2011/07/stupid-spdy-tricks.html you can PUSH JS object to the browser. But again, I don't think that I would do it myself like that :D – kaptan Jul 23 '14 at 18:21
  • 1
    @kaptan: That example is fundamentally no different than making an AJAX or JSONP request. The browser still must somehow know to request a particular file. In the example, this is done by having a ` – josh3736 Jul 23 '14 at 18:31
  • @immibis That's called [inotify](http://linux.die.net/man/7/inotify) and it's quite useful, actually. – Michael Hampton Jul 25 '14 at 04:38
  • @MichaelHampton not useful in every scenario. That's why you can tell inotify what you are interested in for notification. So if you build an antivirus, you can request all FS signals to wake you up. But you could simply decide to monitor only one folder or one tree of folders, and only with the event "just before it is written to disk". So I'm of the opinion that the comment from immibis is correct. And when I have implemented push mechanism in "sensors" I often set a threshold under which no updates is sent to listeners. – Huygens Dec 01 '14 at 08:59
13

Why is polling accepted? Because in reality every solution is actually low-level polling!

If the server should update you as soon as new pictures are available, it usually has to have a connection to you - because IP addresses change often and you never know if someone isn't interested anymore, so the client has to send some form of keep-alive signal, for example, "I'm still here, I'm not offline"

All stateful connections (for example, TCP/IP) work the same, since you can only send singular data-packets over the Internet; you never know if the other party is still there.

So every protocol has a timeout. If an entity doesn't answer within X seconds, it is presumed to be dead. So even if you have only an open connection between server and client, without sending any data, the server and client have to send regular keep-alive packets (this is handled low-level if you open a connection between them) - and how is this in the end any different from polling?

So the best approach would probably be longpolling:

The client sends a request immediately after loading the site (for example, telling the photographer "Tell me if there are any new pictures"), but the server doesn't answer if there aren't any new pictures. As soon as the request times out, the client asks again.

If the server now has any new pictures, it can immediately answer all the clients which stand in line for new pictures. So your reaction time after a new picture is even shorter than with push, since the client is still waiting in an open connection for a reply and you don't have to build up a connection to the client. And the polling requests from the client are not much more traffic than a constant connection between client and server for an answer!

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
Falco
  • 1,293
  • 8
  • 14
  • I disagree that every solution ends up being low-level polling. You're confusing polling required to send data with polling required to know when a client is lost. Yes, the latter will always end up polling somewhere down the protocol stack, but that can be at a very low frequency (such as once every five minutes) whereas polling for actual data every second is a waste that CAN be avoided with true push notifications that is NOT polling at any level of the stack. – Allon Guralnek Jul 25 '14 at 06:29
  • First most keepalive packets run at a fairly high frequency, because you want to avoid common timeout intervals so few sec isn't uncommon for TCP/IP and almost anything not using tcp may be blocked by firewalls. So when I need to send a data packet every X seconds, why not fill it with some data at virtually no cost? – Falco Jul 25 '14 at 07:26
  • 1
    @Guralnek even if you had a connection with a keep alive interval of 5 mins the timeout would be higher, since you have to add actual delay and lost packets. And the server would keep many connections for 5min after the clients have disconnected, so overall this would likely cost more server resources while saving only minimal bandwidth – Falco Jul 25 '14 at 07:30
  • 1
    +1 for long polling. Look up Comet http://en.wikipedia.org/wiki/Comet_%28programming%29 – Zan Lynx Nov 24 '14 at 13:57
9

One advantage of polling is that it limits the harm that can be caused if a message goes missing or the state of something gets glitched. If X asks Y for its state once every five seconds, then the loss of a request or a reply will merely result in X's information being ten seconds out of date rather than 5. If Y gets rebooted, X can find out about it the next time Y is able to respond to one of X's messages. If X gets rebooted, it might never bother asking Y for anything afterward, but whoever is observing the status of X should recognize that it has been rebooted.

If instead of X polling Y, X relied upon Y to inform it whenever its state changed, then if Y's state changed and it sent a message to X, but for whatever reason that message was not received, X might never become aware of the change. Likewise if Y gets rebooted and never has any reason to send X a message about anything.

In some cases it may be helpful to for X to request that Y autonomously send messages with its status, either periodically or when it changes, and only have X poll if it goes too long without hearing anything from Y. Such a design may eliminate the need for X to send most of its messages (typically, X should at least occasionally inform Y that it's still interested in receiving messages, and Y should stop sending messages if it goes too long without any indication of interest). Such a design would, however, require Y to persistently maintain information about X, rather than being able to simply send a reply to whoever polled it and then immediately forget about who that was. If Y is an embedded system, such a simplification may help reduce memory requirements sufficiently to allow the use of a smaller and cheaper controller.

Polling can have an additional advantage when using a potentially-unreliable communications medium (e.g. UDP or radio): it can largely eliminates the need for link-layer acknowledgments. If X sends Y a status request Q, Y responds with a status report R, and X hears R, X won't need to hear any sort of link-layer acknowledgment for Q to know that it was received. Conversely, once Y sends R, it doesn't need to know or care if X received it. If X sends a status request and gets no response, it can send another. If Y sends a report and X doesn't hear it, X will send another request. If each request goes out once and either yields a response or doesn't, neither party needs to know or care whether any particular message was received. Since sending an acknowledgment may consume almost as much bandwidth as a status request or report, using a round-trip of request-report doesn't cost much more than would an unsolicited report and acknowledgment. If X sends a few requests without getting replies, it may on some dynamically-routed networks need to enable link-level acknowledgments (and ask in its request that Y do likewise) so that the underlying protocol stack can recognize the delivery problem and search for a new route, but when things are working a request-report model will be more efficient than using link-level acknowledgments.

supercat
  • 8,335
  • 22
  • 28
  • The problem you talk about with Y pushing messages to X (second paragraph) can be fixed by having a serial number attached to each message. If a message is lost, X will know because it did not receive that serial. At that point it can take other measures to sync up with Y. DNS master -> slave replication works this way. – korylprince Jul 23 '14 at 02:51
  • @korylprince: Either side can find out about the missing message if the other side has occasion to send something (and does so successfully), or if it has reason to expect something from the other side and never receives it. If one side sends a status update and either doesn't require acknowledgments or gives up after retrying a few times, and the other side isn't expecting scheduled transmissions, the other side won't know that the connection has disappeared. – supercat Jul 23 '14 at 03:42
  • 2
    @korylprince - The problem is, without periodic messages, X may detect the missing message a day late or a year late or 10 years late. To detect missing packet in reasonable time you need to somehow poll. You can "pull" poll or you can "push" poll. The first is called "polling" the second is called "heartbeat" – slebetman Jul 23 '14 at 04:19
  • Both very true. It all depends on the situation. – korylprince Jul 23 '14 at 11:30
  • @slebetman: Without periodic messages, if Y gets rebooted, there may be no mechanism by which X would *ever* discover it. – supercat Jul 23 '14 at 14:34
1

The question is to balance the amount of unnecessary polls vs the amount of unnecessary pushes.

If you poll:

  • You get an answer at this very moment. Good if you ask only occasionally or need a data set this very moment.
  • You might get a "no content" answer, causing pointless load on the line.
  • You put load on the line only when you poll, but always when you poll.

If you push:

  • You deliver the answer right when it is available, which allows an immediate processing on the client side.
  • You might deliver data to clients which are not interested in this data, causing pointless load on the line.
  • You put load on the line every time there is new data, but only when there is new data.

There are several solutions on how to deal with the various scenarios and their disadvantages, like for example a minimum time between polls, poll-only proxies to take the load off the main system, or - for the pushes - a regulation to register and specify the wanted data followed by unregistering on log-off. Which one fits best is nothing you can say in general, it depends on the system.

In your example polling is not the most efficient solution, but the most practical one. It is very easy to write a polling system in JavaScript, and it is very easy to implement it on the delivery side as well. A server made to deliver image data should be able to handle the extra requests, and if not, it can be scaled linearly, as the data is mostly static and can therefore be easily cached.

A push method implementing a log-in, description of wanted data and finally a log-off would be most efficient, but is probably too complex for the average "script-kiddy", and needs to deal with the question: what if the user just shuts down the browser and log-off cannot be performed?

Maybe it is better to have more users (as accessing is easy) than to save some bucks on another cache-server?

TwoThe
  • 201
  • 1
  • 2
1

For some reason, these days, all the younger web developers seem to have forgotten the lessons of the past, and why some things have evolved the way they did.

  1. Bandwidth was an issue
  2. Connection might be intermittent.
  3. Browsers did not have as much computing power
  4. There were other methods of accessing content. The web is is not w3.

In the face of these constraints, you might not have a constant 2 way communication. And if you looked at the OSI model, you'd find most considerations are meant to decouple persistency with the underlying connection.

With that in mind, a polling method of pulling information is a great way to reduce bandwidth and computation on the client side. The rise of push is really for the most part just the client doing constant polling, or web sockets. Personally if i was everyone else out there, i'd appreciate the regularity of polling as a means of traffic analysis, where an out of time GET/POST request would signal a man in the middle situation of some sort.