Role of the MMU in a Page Fault Swap

Question

When a virtual memory address outside the range of loaded into physical RAM is referenced and a page fault occurs, does the Memory Management Unit rely on DMA (Direct Memory Access) to swap the referenced page into RAM, or is an interrupt routine that uses the CPU to copy in the memory in some more mundane way occur?

Assume the x86 architecture, please.

score 4 · Answer 1 · answered Oct 12 '14 at 12:32

In general, the MMU merely notifies the CPU of a page fault. It is up to the CPU to resolve the problem and then update the MMU tables so that the access can be restarted.

The CPU usually resolves the problem by initiating one or more I/O operations to the backing storage, which is usually a local disk of some sort, but in some cases can be a remote disk accessed over the network. If the page being replaced is "dirty" (needs to be written out), this needs to occur before the desired page is read in. This I/O may or may not involve DMA. The point is, this process is too complicated to implement directly in hardware.

It is a key requirement that any processor that supports virtual memory must be able to suspend and restart an access that experiences a page fault. Many early microprocessors (e.g., 68000, 8086) did not have this capability, which made it very difficult, if not impossible.

gbulmer · Answer 2 · 2014-10-12T17:07:21.507

As Dave Tweed has explained, the process of handling a page fault is too complex to have done completely in hardware.

The MMU has a very close relationship with the CPU.

The MMU 'interrupts' the CPU in mid-instruction. This is sometimes called a 'page fault exception'. This relationship is much closer than normal interrupts. AFAIK, no CPU accepts an external interrupt in mid-instruction.

A page fault exception causes the CPU to either 'roll-back' or 'dump state' for the instruction which used the invalid virtual memory address. This is critical because the instruction will need to be 're-run' (i.e. executed again), if the virtual address is valid, and it's page is loaded into RAM.

The first step in handling a page fault is to decide if the access to the virtual address is valid, and part of the processes address space. The address might be valid, for example it might be to program code, but the access might be a write, while the page is execute only, to protect code from being damaged. So the access might fail there, the process 'killed' by the OS, and that will be bubbled up to the parent process (e.g. a shell).

Assuming the access is okay, and the virtual memory address is valid, then the OS chooses a page of RAM to use for the missing page. The OS may decide to 'evict' a page to make room for the missing page because all of RAM is in use. If you are using a computer where this happens, the slow-dwn can be very noticeable. Worse, the OS may need to chose a page to 'evict' which itself contains data which is not on external storage, a 'data' page. This is avoided as much as practical, because it may requires two I/O transfers, one to save the evicted page from RAM to external storage, and one to load the missing page into RAM. Program code is (on normal OSs) read-only. So evicting a read-only page doesn't require a save as the code because it is already on external storage, and so only needs one I/O operation, to read the missing page.

The OS can now start the I/O transfers. On typical disk storage this will take several milliseconds, enough time for the CPU to execute several million instructions. So the OS doesn't wait for the missing page to be loaded. Instead it runs a different process. Typically, the page read from disk, or external store, is put into memory using DMA. However, a lot of machine instructions were executed to get to that point.

Eventually the I/O transfer is complete, and the page is in RAM; the OS received an interrupt from DMA to inform it that a DMA transfer is complete. The OS can now 'fixed up' the MMU's virtual address tables with the physical address of the newly loaded page. Then the OS can arrange for the process to be restarted at the instruction which was aborted (aborted by the MMU when it detected the page fault). This time the instruction should complete.

Hopefully, it is clear that simple instructions which only access memory once, to load or store data, are much easier to deal with than an instruction which accesses memory more than once.

For example x86 and 68000 had instructions which access memory two or more times. Each memory access could cause a page fault. Hence the CPU must either roll-back the incomplete instruction and save enough state to re-run the instruction from the start, or save enough state to pick up and continue the incomplete instruction. In either case, that might be millions or even billions of instructions later, with other instructions in between also suffering page faults.

Very complex instructions might update several registers and memory in a loop. So there is quite a lot of state which may need to be rolled-back or stored. Stored state for incomplete instructions is not needed to support a traditional hardware interrupt. So 'incomplete instruction exceptions' save different state on the stack, and hence have to be handled differently for normal interrupts. I don't think this was why the 68020 lost to x86, but it was added complexity.

Making virtual memory, and hence aborting instructions easy to implement is one of the reasons RISC architectures have very simple memory access instructions. A RISC load or store instruction can only cause one page fault, and if the addressing modes have no side-effects (they don't change any register values), then the page fault can be treated like an ordinary instruction being interrupted before it starts.

(This is all made even more complex by pipelined CPUs)

This is a great answer (and David Tweed's answer was very helpful too). Let me see if I can summarize: The MMU detects the page fault, and signals the CPU with a mechanism that is tighter/lower-level than a normal IRQ. The CPU notifies the OS via an IRQ and the OS is in charge of saving off context, finding an appropriate page to replace, persisting the evicted page to secondary storage if it was "dirty", and initiating a read-in via DMA. Other processes proceed until the DMA read-in proceeds. Then another IRQ inspires the restoration of the initial faulting process. ? — Padawan Learner, Oct 12 '14 at 13:54
BTW, I'd up vote both answers but I don't have sufficient reputation. — Padawan Learner, Oct 12 '14 at 13:59
I don't really agree. AFAIK, x86 MMU does not stop in the middle of instructions. The complexity, compared to RISC CPUs is rather related to the use of both segmentation and pagination, the automatic saving of many registers and the complexity related to things like the V86 mode, and other crazy compatibility stuff. If an instruction triggers a page fault, it has no side-effect, so that it can be re-executed after the fault. There are a few tricky cases with misaligned instructions that cross page boundaries. — TEMLIB, Oct 12 '14 at 15:56
For CPUs that really trap 'in the middle', look for the MC68000 family, particularly MC68020 onwards. The CPU saves into the stack its own internal state, largely undocumented, in order to be able to continue partially executed instructions. — TEMLIB, Oct 12 '14 at 15:57
@TEMLIB - Yup. I wrote about the 68020 initially, but removed it before publishing; I think it is so long ago that it is irrelevant - dead for 20years. I *think* it was adequately documented, because I remember reading the documentation, and that was before the WWW. — gbulmer, Oct 12 '14 at 16:47
@PadawanLearner - Your statement "The MMU detects the page fault, ... a normal IRQ." Is correct. However, "The CPU notifies the OS via an IRQ and the OS is in charge of saving off context" isn't. The CPU's hardware traps to an IRQ-like piece of code, which is part of the OS; the OS is running on the CPU, and 'knows' about the page fault because the page fault exception (IRQ) code is running. Then yes, the OS saves context, chooses a page, runs the 'evict' for dirty pages, runs the load. DMA is usually used to handle all external storage reads. The process can be rescheduled after page load. — gbulmer, Oct 12 '14 at 16:49
@TEMLIB - I will clarify the stuff about x86. IIRC, x86 decided to roll back the whole CPU state, including registers, so that a page fault looks like an ordinary interrupt. I'd need to go back to 80386 to confirm, but I am pretty sure. An issue was memory may actually be changed (e.g. if it is shared memory) before the instruction is re-run; instructions restart needs to be idempotent. IMHO segmentation does not add extra complexity. I don't think an instruction crossing a page boundary has to add much, other than not evicting the previous page !-) — gbulmer, Oct 12 '14 at 16:59
@gbulmer : Maybe you came across some more detailed stuff. In the usual Motorola documents describing exception stack frames, there were some known places where registers were saved, there were also 'reserved' areas that were kept for saving internal state, or, maybe, as provisions for future versions. (http://www.freescale.com/files/archives/doc/ref_manual/M68000PRM.pdf, page 630+, "INTERNAL REGISTERS"). Anyway, you're right, that's obsolete, it does not matter anymore ;-) — TEMLIB, Oct 12 '14 at 17:01
@TEMLIB - Maybe I did have more detailed stuff. A friend designed a 68000, then 68020 board, and IIRC ported UNIX to it. He may have had processor manuals, or more documentation than common. I decided the answer would be clearer if there was a specific instance of mid-instruction vs roll-back page faults. So I reinstated the 68000 stuff. I am not going to evict that page :-) — gbulmer, Oct 12 '14 at 17:09
@gbulmer : x86 : Okay, I should not have added segmentation into the mix. Pagination looks sane on x86, and segmentation is being obsoleted. — TEMLIB, Oct 12 '14 at 17:10

Role of the MMU in a Page Fault Swap

2 Answers2

Linked