The minimum Ethernet frame size was defined for the original, half-duplex variants. With half duplex, you need to reliably detect and propagate collisions while they taking place. A signal needs to be able to propagate over the longest distance between two stations in a segment, allow collision detection, and propagate the jamming signal back to the sender while it is still transmitting. Putting everything together, you end up with 512 bits or 64 bytes in a frame.
For Fast Ethernet (100 Mbit/s), the inherently half-duplex coax cable was abandoned and only full-duplex capable media are used (=media with dedicated signal paths per direction). This speeds up collision detection considerably and allows to use the same minimum frame size, even though a frame over Fast Ethernet is much shorter in time.
Gigabit Ethernet initially included a half-duplex mode (HDX), requiring small Ethernet frames to be followed by an extension field (behind the FCS field for compatibility) and the HDX link length to be limited. The extension field is immediately removed again at the MAC layer, so it's only ever existing on a HDX medium. Half-duplex GbE wasn't actually used anywhere and it was officially obsoleted in 2011.
Switched, full-duplex Ethernet makes these considerations obsolete. However, Ethernet is built on compatibility - all physical layer variants can coexist and interact with each other. So, the minimum frame size was never changed and an ancient 10 Mbit/s half-duplex node can still work in a modern multi-gigabit network without much ado.
However, there's much confusion about these details and much is quoted wrong. The reference is IEEE 802.3 Clause 4.4.2 MAC parameters.