As Dave Tweed has explained, the process of handling a page fault is too complex to have done completely in hardware.
The MMU has a very close relationship with the CPU.
The MMU 'interrupts' the CPU in mid-instruction. This is sometimes called a 'page fault exception'. This relationship is much closer than normal interrupts. AFAIK, no CPU accepts an external interrupt in mid-instruction.
A page fault exception causes the CPU to either 'roll-back' or 'dump state' for the instruction which used the invalid virtual memory address. This is critical because the instruction will need to be 're-run' (i.e. executed again), if the virtual address is valid, and it's page is loaded into RAM.
The first step in handling a page fault is to decide if the access to the virtual address is valid, and part of the processes address space. The address might be valid, for example it might be to program code, but the access might be a write, while the page is execute only, to protect code from being damaged. So the access might fail there, the process 'killed' by the OS, and that will be bubbled up to the parent process (e.g. a shell).
Assuming the access is okay, and the virtual memory address is valid, then the OS chooses a page of RAM to use for the missing page. The OS may decide to 'evict' a page to make room for the missing page because all of RAM is in use. If you are using a computer where this happens, the slow-dwn can be very noticeable. Worse, the OS may need to chose a page to 'evict' which itself contains data which is not on external storage, a 'data' page. This is avoided as much as practical, because it may requires two I/O transfers, one to save the evicted page from RAM to external storage, and one to load the missing page into RAM. Program code is (on normal OSs) read-only. So evicting a read-only page doesn't require a save as the code because it is already on external storage, and so only needs one I/O operation, to read the missing page.
The OS can now start the I/O transfers. On typical disk storage this will take several milliseconds, enough time for the CPU to execute several million instructions. So the OS doesn't wait for the missing page to be loaded. Instead it runs a different process. Typically, the page read from disk, or external store, is put into memory using DMA. However, a lot of machine instructions were executed to get to that point.
Eventually the I/O transfer is complete, and the page is in RAM; the OS received an interrupt from DMA to inform it that a DMA transfer is complete. The OS can now 'fixed up' the MMU's virtual address tables with the physical address of the newly loaded page. Then the OS can arrange for the process to be restarted at the instruction which was aborted (aborted by the MMU when it detected the page fault). This time the instruction should complete.
Hopefully, it is clear that simple instructions which only access memory once, to load or store data, are much easier to deal with than an instruction which accesses memory more than once.
For example x86 and 68000 had instructions which access memory two or more times. Each memory access could cause a page fault. Hence the CPU must either roll-back the incomplete instruction and save enough state to re-run the instruction from the start, or save enough state to pick up and continue the incomplete instruction. In either case, that might be millions or even billions of instructions later, with other instructions in between also suffering page faults.
Very complex instructions might update several registers and memory in a loop. So there is quite a lot of state which may need to be rolled-back or stored. Stored state for incomplete instructions is not needed to support a traditional hardware interrupt. So 'incomplete instruction exceptions' save different state on the stack, and hence have to be handled differently for normal interrupts. I don't think this was why the 68020 lost to x86, but it was added complexity.
Making virtual memory, and hence aborting instructions easy to implement is one of the reasons RISC architectures have very simple memory access instructions. A RISC load or store instruction can only cause one page fault, and if the addressing modes have no side-effects (they don't change any register values), then the page fault can be treated like an ordinary instruction being interrupted before it starts.
(This is all made even more complex by pipelined CPUs)