From M1's open-drain, I suspect that you may be trying to build a very fast response voltage-controlled current sink, in which case all elements of this circuit are a good start.
The push-pull stage is better than a simple op-amp at driving M1's gate, as it can sink and source hundreds of milliamps of gate current, necessary to have M1 respond quickly.
I'm pretty sure that the "current spike" you spoke of is due to exactly that. M1's gate is a capacitor that needs charging and discharging (via Q1 and Q3), as its gate potential rises and falls. There's nothing you can do about that. If you want the gate to change potential quickly, then you have to charge and discharge that capacitor quickly, and that can require hundreds of milliamps
(or even amps) of current for a short period. This is exactly why you require this push-pull gate driver in the first place.
Without the driver, the op-amp could only provide only a few milliamps, and gate charging and discharging could take 100 times longer.
This brings me to the requirement for power-supply decoupling, using capacitors. Those are the capacitors between collector and ground that you mentioned. When a power supply is suddenly asked by Q1 and Q3 to provide a few hundred milliamps, it can't possibly comply instantly, due to inductance in the power path. But Q1 and Q3 need that current now, not in 100ns. That's why you put capacitors right at the collector, to ground, ideally a ceramic 100nF capacitor, and an electrolytic 10uF one (both in parallel). These are short term sources of energy, to handle both fast and slightly prolonged high current demands.
They also serve to protect everything else in the vicinity. Everything around can fall victim to the inevitable dips in supply voltage caused by these sudden spikes of current demand. Power supply decoupling capacitors mitigate these dips, not only for the element causing them, but also for other elements connected to the same supply, directly or indirectly.
These capacitors are not just recommended, they are essential.
A gate resistor is usually a good idea, since it protects elements on both sides. Not only do fast changes in gate potential get coupled capacitively to the drain (your load), but fast transients at the drain get equally coupled back to the gate, which can be problematic for the gate driver itself.
This is a huge topic, too big to cover properly here, but I would definitely advocate for 10Ω or so of resistance between the gate and the push-pull driver output.
Biasing "diodes" Q2 and Q4 are to mitigate cross-over distortion, which can be demonstrated by examining behaviour without them. Here I've removed everything that isn't necessary to show this, and I am driving both transistor bases with a 2V amplitude sinusoid:

simulate this circuit – Schematic created using CircuitLab
The circuit above left has this relationship between IN (blue) and OUT (orange):

Those flats in the output are caused when neither transistor is either "pushing" or "pulling" anywhere, because both are off. There's this dead-zone between \$-0.7V < V_{IN} < +0.7V\$ where the input is not sufficient to switch on either transistor, and the output is clamped to zero by load RLa.
Additionally, since these transistors are operating as emitter followers (or common-collector), their emitters are always 0.7V different from their bases. When the input goes positive, Q1a switches on, "pulling" the output up in potential, but always 0.7V lower. When the input is negative, then Q3a is "pulling down", always 0.7V above the input.
Above right I have "closed the loop" using an op-amp with negative feedback (a voltage follower) to compensate for this "gap". It adjusts its own output to whatever potential is necessary to equalise its two inputs. The IN-OUT relationship is:

The input is blue, but hidden behind the orange output; this op-amp seems to have completely eradicated cross-over distortion. The most interesting trace is the tan one, the op-amp's own output. As you can see, the op-amp is traversing the "dead-zone" extremely rapidly, compensating for the transistors' requirement to have 0.7V more (or less) than the input to switch on. The output is a perfect copy of the input. It's amazing.
Well, not quite perfect. The op-amp has limited output slew rate, and can't cross that gap instantly. At low frequencies it works well, but at higher frequency, when we ask it to traverse the gap faster than it can, we start to see output artefacts. Here's what you get when input frequency is raised to 50kHz:

Crossover distortion returns somewhat, because the op-amp output is unable to keep up with the input, and can't cross that dead zone quickly enough.
Even with the loop closed, there will be a tiny "blip" in the output as the op-amp crosses the dead zone, but the output will always settle to the correct value eventually. At low frequency it will be almost imperceptible. That may be completely fine for your application.
If you want an absolutely perfect output, at all frequencies, then you must make it so that the op-amp output doesn't have to jump that gap, and that's the purpose of R3, R4, Q2 and Q4 (and R1, R2). Their job is to switch the transistors on, at all times. They add and subtract 0.7V to the signal, so that the transistors' bases are permanently biased "on", with \$V_{BE}=0.7V\$ each, always. Their emitters follow the input nearly exactly, since the 0.7V difference has already been compensated for by "diodes" Q2 and Q4.
R3 and R4 are necessary to pass a permanent current through the biasing diodes Q2 and Q4. Without them, those "diodes" can't develop the 0.7V needed to turn on the transistors.
All this this comes at a price; a large current flowing down through Q1 and Q3, all the time, which is mitigated by R1 and R2. This current passes through them, causing them to develop a small voltage, for which 2Ω is sufficient. This in turn reduces the transistors' base-to-emitter potential difference \$V_{BE}\$ just a little, tending to switch them off again.
The system finds some equilibrium in which the transistors are "on" enough to overcome the 0.7V input-to-output difference, but not so "on" that current through them is damaging or excessively wasteful of power. Take a look at this scenario, with and without R1/R2:

simulate this circuit
On the left, AM1a and AM2a show a ridiculous amount of current flowing, even with zero input. On the right, notice how the presence of R1 and R2 has lowered \$V_{BE}\$ (shown on VM1 & VM2) for Q1 and Q3, to the point where they are just conducting. Quiescent current dropped from over 800mA to under 50mA.
Remembering that there's no op-amp here, take a look at the IN-OUT relationship:

It's nearly perfect. There's no visible cross-over distortion, but the output differs a little from the input. To get rid of that discrepancy, just close the loop with the op-amp, and you have perfect equality (op-amp imperfections notwithstanding) between output an input.
Don't forget, the purpose of all that biasing is to remove the need for the op-amp to "jump the gap", and this setup achieves that goal.
This biasing is achieved by all components R1, R2, R3, R4, Q2 and Q4. Either they are all there, or none of them are.
While it is true that all this biasing is complicating an otherwise simple system, when the loop is closed, serious non-linearity like cross-over distortion, and delays introduced by transistors recovering from cut-off or saturation, can play merry hell with stability. Biasing like this improves linearity, keeps the transistors in their active region, and will no doubt lead to less oscillation, and better, faster response.
It will also help to keep the system stable in the case where the load on the other side of the MOSFET isn't well behaved (inductive or capacitive).
My advice is to keep all that biasing paraphernalia, since the benefits are good.
As for the diamond buffer, I hadn't heard of this before you mentioned it, so I don't know enough, or have any experience to back up anything I say about it.
Update
I studied the diamond buffer, I highly recommend it, but it's probably overkill for your project. Here is what I have to say about it.