FPGAs don't actually have "gates" per se. They typically have Look-Up Tables (LUTs). LUTs are typically implemented using SRAMs. For instance, Spartan 3 FPGAs use 16-bit SRAMs; that is, four address inputs produce one output signal. "Programming" is done by loading the SRAM with a bit pattern representing the truth table, such that for e.g. 2-input XOR, you have address 00 = output 0, address 01 = output 1, address 10 = output 1, address 11 = output 0.
This all means that FPGAs actually have many, many extra and unnecessary gates to perform the same logic function. If you need FPGAs for reprogrammability and rapid prototyping, then this is great! In fact, some people implement the design first in the FPGA, debug it, and then move to an ASIC, which will be smaller, faster, and consume less power, all while doing the same thing the FPGA does.
Modern microprocessors are also pipelined. For instance, in a simple FPGA program, a very large calculation involving several adds and maybe a few multiplies and a comparison may be carried out in the same clock cycle. Doing all this work in one clock cycle means the clock cycle must be long. In a pipelined implementation (which is possible to implement in FPGAs and is often used to achieve timing closure), the big calculation is broken down into pieces, and each piece is executed in one much shorter clock cycle. It still takes about the same amount of time to do the calculation, but the advantage is that after the first piece is calculated and the first partial datum has moved to the second piece, the first piece can immediately begin processing the second datum. The first calculation will still take many cycles to complete, but once it is done, a new calculation will be completed during every clock cycle.
So, in a nutshell, FPGAs have generic logic while CPU has specific logic. FPGA has generic routing while CPU has specific routing. FPGA may be pipelined, but CPU is definitely pipelined.