Why do we have to wait for I/O?

Question

It's always been known that Disk operations are slow and we know the reasons why they are slow. So the question here is why do we have to wait for I/O or why is there such a thing as IOWait, etc.?

I mean I've noticed that when you're doing some I/O tasks in the background, your computer basically gets a lot slower, I've especially noticed that when using Linux, if you're doing some longer I/O tasks, the OS becomes almost unusable until they are completed.

Indeed, I also found this topic in an article, there's a snippet:

I/O wait is 12.1%. This server has 8 cores (via cat /proc/cpuinfo). This is very close to (1/8 cores = 0.125)

So basically it means it's slowing down the computer a LOT, why is that? I mean OK, now the normal computer are at least 2 cores, sometimes 4 or sometimes they have more because of hyperthreading or something like that. But now the question is why the CPU actually has to stay there, practically not doing anything else than just waiting for IO? I mean the basic idea or architecture of the process managament, now I don't know whether it's the OS who's responsible for that, or does it come down to the hardware part, but it should be made possible for the cpu to wait or to check regularly, while actually performing lots of other tasks and only going back to the IO process when it's ready. Indeed, if that's such a difficult task and the cpu would have to wait, why isn't that managed by hardware more efficiently then? Like for instance there could be some kind of mini cpu which would just wait for it and deliver the small part of data to the real cpu as soon as it gets back to the process and so the process would be repeated and we wouldn't have to practically dedicate a whole cpu core for the data copy process... Or would I be the one who should invent this kind of stuff and get a nobel prize for that? :S

Now okay, I'm really putting it now from an observers perspective and I really haven't gone that deep into the topic, but I really don't understand why the cpu has to work with the speed of HDD, while it could just do something else and come back to HDD once it's ready. The idea is not to speed up the application who needs that IO operation or the copy process or whatever, but the idea is to just minimally affect the CPU consumption while performing that operation, so that the OS could utilise it for other processes and the user wouldn't have to feel general computer lag when doing some copying operations...

"while it could just do something else" - such as? It needs to work with data. If that data isn't in the CPU L1 cache, it needs to fetch it from the L2 cache. If not in the L2 cache, it needs to fetch from the L3 (if it has one). If it isn't at all on the on die caches, it needs to access main memory. If not in main memory... in needs to access the HDD. — Oded, Sep 08 '15 at 12:50
The computer _does_ do something else; the kernel blocks the thread until the IO operation is complete, letting other threads/processes run. But if everything is waiting on disk IO, then there's nothing else to do. — Colonel Thirty Two, Sep 08 '15 at 16:09
First understand dependency. If any computation depends on data on the disk that computation must wait as must any computation waiting on that computations results. Next understand it's not just cpu but memory that makes you fast. It doesn't matter if you have 1001 cpu's when you have to swap memory down to the HD. Doing that slows down everything using that memory. Multithreaded or not. — candied_orange, Sep 08 '15 at 19:10
You gotta wait for the programs to reach the I/O tower and send you their frisbees! — Almo, Sep 08 '15 at 21:45
Typically modern OSes do what you are complaining that they don't do -- IO operations are dispatched to the appropriate hardware and interrupts are generated by the hardware to signify that the operations are done. Processes waiting on IO are usually blocked while waiting (this can be changed). If many processes are waiting on IO and no other processes have anything for the CPU to do then there's not much to do. You could also end up in mem-swap Hell. Writing programs to efficiently utilize CPU, memory, and IO requires special skills, and what else is running also effects what works best. — nategoose, Sep 09 '15 at 13:31

score 24 · Answer 1 · edited Apr 13 '17 at 12:13

It's possible to write asynchronous IO where you tell the OS to dispatch a disk read/write and then go do something else and then later check if it's done. It's far from new. An older method is using another thread for the IO.

However that requires that you have something to do while that read is being executed and you will not be allowed to touch the buffer you passed in for the result.

It's also much easier to program when you assume everything is blocking IO.

When you call a blocking read function you know it won't return until something has been read and immediately after you can start processing on it.

The typical read loop is a good example

//variables that the loop uses
char[1024] buffer;
while((read = fread(buffer, 1024, 1, file))>0){
    //use buffer
}

Otherwise you need to save the current function state (usually in the form of a callback + userData pointer) and pass it + identifier of the read operation back up to a select() type loop. There if an operation is finished it will map the identifier of the read operation to the callback+data pointer and invoke the callback with information of the completed operation.

void callback(void* buffer, int result, int fd, void* userData){
    if(result<=0){
    //done, free buffer and continue to normal processing
    }
    //use buffer

    int readID = async_read(fd, buffer, userData->buff_size);
    registerCallback(readId, callback, userData);
}

This also means that every function that could end up using that async read would need to be able to handle a async continuation. That is a non-trivial change throughout most programs, you ask people trying to get into async C# about that.

However synchronous IO vs. asynchronous IO isn't the cause of the general slowdown. Swapping pages in is also an operation that needs to wait on IO. The scheduler will just switch to another program that isn't waiting on IO if one is (IO wait is when the processor is idle and there is a IO operation pending).

The real problem is that both the harddrive and the CPU use the same channel to communicate with the RAM; the memory bus. And unless you are using RAID then there is only a single disk to get the data from. This is made worse if you are also using a graphics intensive application, then the communication with the GPU will also interfere.

In other words the real bottleneck is probably in the hardware rather then the software.

"However synchronous IO vs. asynchronous IO isn't the cause of the general slowdown." So why did you decide to focus on this relatively advanced topic when the question is about the basics? — svick, Sep 08 '15 at 23:18
Fun fact: there's actually a really old mechanism that lets programs do something else while doing I/O without having to deal with callbacks; it's called *threads*. — user253751, Sep 09 '15 at 07:31
Good discussion of the pros/cons of sync/async IO. But are you sure that's the reason for the slowdown? Generally I find that slowdowns under heavy IO load are firstly because of poorly-architected software, or when that's not the case then because the system is using a single, _slow_ disk (i.e. non-SSD), and everything is trying to access it concurrently. I'd blame a bottleneck on the disk's ability to service requests before I'd blame it on saturation of the memory bus. You need _really_ high-end storage to saturate a modern memory bus. — aroth, Sep 09 '15 at 09:38

manlio · Accepted Answer · 2015-09-08T14:25:35.493

The I/O schemes you are describing are in current use in computers.

why the CPU actually has to stay there, practically not doing anything else than just waiting for IO?

This is the simplest possible I/O method: programmed I/O. Many embedded systems and low/end microprocessors have only a single input instruction and a single output instruction. The processor must execute an explicit sequence of instructions for every character read or written.

but it should be made possible for the cpu to wait or to check regularly, while actually performing lots of other tasks and only going back to the IO process when it's ready

Many personal computers have other I/O schemes. Instead of waiting in a tight loop for the device to become ready (busy waiting), the CPU starts the I/O device asking it to generate an interrupt when it's done (interrupt driven I/O).

Although interrupt-driven I/O is a step forward (compared to programmed I/O), it requires an interrupt for every character transmitted and it's expensive...

Like for instance there could be some kind of mini cpu which would just wait for it and deliver the small part of data to the real cpu as soon as it gets back to the process and so the process would be repeated and we wouldn't have to practically dedicate a whole cpu core for the data copy process...

The solution to many problems lies in having someone else do the work! :-)

The DMA controller/chip (Direct Memory Access) allows programmed I/O but having somebody else do it!

With DMA the CPU only has to initialize a few registers and it's free to do something else until transfer is finished (and an interrupt is raised).

Even DMA isn't totally free: high speed devices can use many bus cycles for memory references and device references (cycle stealing) and the CPU has to wait (DMA chip always has a higher bus priority).

I/O wait is 12.1%. This server has 8 cores (via cat /proc/cpuinfo). This is very close to (1/8 cores = 0.125)

I think this is from: Understanding Disk I/O - when should you be worried?

Well it isn't strange: the system (mySQL) must fetch all the rows before manipulating data and there aren't other activities.

Here there isn't a computer architecture / OS issue. It's just how the example is set.

At most it could be a RDBMS tuning problem or a SQL query problem (missing index, bad query plan, bad query...)

Mike Nakis · Answer 3 · 2015-09-08T14:41:16.263

Have faith that the processing of other stuff while waiting for I/O is quite streamlined, close to as streamlined as possible. When you see that your computer is waiting for I/O only 12.1% of the time, it means that it is in fact doing a lot of other things in parallel. If it really had to wait for I/O without doing anything else, it would be waiting for 99.9% of the time, that's how slow I/O is.

The only way to do more things in parallel is by predicting what the user might want to do next, and we are not very good at that kind of prediction yet. So, if the user performs an operation which requires a particular sector to be read from the hard drive, and that sector does not already happen to be in the cache, then the OS will start the very long process of reading that sector, and it will try to see if there is anything else to do in the mean time. If there is another user who wants a different sector, it will queue that request too. At some point, all requests have been queued, and there is nothing we can do but wait for the first one of them to be satisfied before we can proceed. It is just a fact of life.

EDIT:

Finding a solution to the problem of how to do other stuff while doing I/O would be an admirable feat, because it would at the same time be a solution to the problem of how to do other stuff while idle. An amazing feat that would be, because it would mean that you would find work for your computer to do, while it does not have any.

You see, this is what happening: your computer is just sitting 99.99% of the time, doing nothing. When you give it something to do, it goes and does it. If in doing so it has to wait for I/O, it sits there and waits. If it has something else to do while doing I/O, it does that, too. But if it does not have anything else to do besides I/O, then it has to sit there and wait for the I/O to be completed. There is no way to work around that, other than by enrolling in SETI@Home.

Well the 12.1% example was from a website and the example was taken from a server with 8 cores, the idea there was that almost one whole core was just reserved for that operations, sure the other cores were free to do anything and with 8 cores you're well off, but what if you have only a single core? :/ — Arturas M, Sep 08 '15 at 13:49
@ArturasM Either you've misunderstood what the website is saying, or the author of the website has misunderstood something. A computer with only a single core would spend _less_ time waiting for I/O (since all the tasks that aren't waiting for IO, that are executing on the other cores while one core sits idle, would all have to execute on the single core). The I/O takes a certain amount of time to happen whether you wait for it or not - having time to wait for it is a symptom of having nothing else to do with that time. — Random832, Sep 08 '15 at 14:32

Basile Starynkevitch · Answer 4 · 2015-09-09T04:51:43.580

It does depend upon your application code. I am supposing your code is running on Linux.

You could use multi-threading (e.g. POSIX pthreads) to have compute-bound threads doing some computation while other IO-bound threads are doing the IO (and waiting for it). You could even have your application running several processes communicating with inter-process communication (IPC), see pipe(7), fifo(7), socket(7), unix(7), shm_overview(7), sem_overview(7), mmap(2), eventfd(2) and read Advanced Linux Programming etc....

You could use non-blocking IO, e.g. pass O_NOBLOCK to open(2) etc etc etc...; then you'll need to poll(2) and/or use SIGIO signal(7)... and handle the EWOULDBLOCK error from read(2) etc...

You could use POSIX asynchronous IO, see aio(7)

For file access, you could give hints to the page cache, e.g. with madvise(2) after mmap(2) and with posix_fadvise(2); see also the Linux specific readahead(2)

But you would eventually reach some hardware bottleneck (the bus, the RAM, etc...). See also ionice(1)

score 6 · Answer 5 · answered Sep 08 '15 at 17:41

The OS (unless it's a very low level embedded system or something similarly exotic) already takes care of this: if your application has to wait for I/O, it will usually block on that I/O and some other thread or application will become active. The scheduler decides which one.

It's only if there is no other thread or application which could be running that you're actually accumulating wait time. In the article you quoted (thanks to @manlio for the link), that is the case: you have 12.1% waiting vs. 87.4% idle, which means one core is waiting for I/O to complete while the rest is not doing anything at all. Give that system something to do, preferrably multiple somethings, and the wait percentage should drop.

One of the high goals of todays application design is ensuring that even if there is only a single application running, and even if that single application is at some point waiting for I/O, the application can still continue on some other chunk of work. Threads are one approach to this, non-blocking I/O another, but it very much depends on the kind of work you're doing, whether you actually can get something done without the data you're waiting for.

when using Linux, if you're doing some longer I/O tasks, the OS becomes almost unusable until they are completed.

That's typically an indication of some I/O-bound situation. I dare say that the system isn't getting slow because it can't do enough CPU processing. More likely it's slow because a number of things depend on data from the HDD, which is busy at that time. This might be applications you want to run but which have to load their executable files, library files, icons, fonts and other resources. It might be applications you already have running, but which have swapped out part of their memory and now need that swapped in again to proceed. It might be some daemon which for one reason or other thinks that it not only has to write a line to a log file but actually flush that log file before answering some request.

You can use tools like iotop to see how I/O capacity is allocated to processes, and ionice to set I/O priorities for processes. For example on a desktop machine, you could classify all bulk data processing to the idle scheduling class, so that the moment some interactive application needs I/O bandwidth, the bulk processing gets suspended until the interactive application is done.

score 1 · Answer 6 · answered Sep 29 '15 at 20:06

I add other viewpoint than others, maybe controversial:

Its typical problem of Linux operating systems. Lagging specifically (Search for "Linux mouse lag" ). Windows does not have this problem. I have dual boot Windows 7 and Linux Mint. Even when doing intensive disk operation in Windows, Windows feels smoot, mouse is moving normally. In Linux oppositelly it does not feels so smoots and mouse sometimes lag even during normal web browsing.

Its probably because different philosophy and history of these two systems. Windows is from beginning designed for ordinary users, its principially graphical operatings systems. And for Windows users, non-smooth system behavior and mouse stopping moving is signal that something is wrong. So Microsofts programmers worked hard to design whole system to minimize cases when systems feels slow. In opposite Linux is not initially graphical system, desktop is only 3rd party addition here. And Linux is principially designed for hackers using command line. Get things done philosophy. Linux is simply not designed for smooth behavior in mind, feelings does not matter here.

Note: I am not saying that Windows is better than Linux, i say they simply have different overall philosophy, which in complex environment can lead to different high level behavior/feeling of these systems.

Linux mouse lag could probably be avoided or lessened by careful configuration of the system (ie. using `nice` & `ionice` on hungry processes). And I do use Linux and almost never experienced that Linux mouse lag (except when overloading my computer...) — Basile Starynkevitch, Sep 30 '15 at 10:44
I'll note that I have experienced UI and mouse lag on Windows 7, even during times when Task Manager and Resource Monitor indicated low memory usage and low CPU and disk activity. — 8bittree, Sep 30 '15 at 14:20

Why do we have to wait for I/O?

6 Answers6