It has to do with the sampling rate, and how the sampling clock (the local oscillator or LO) relates to the signal frequency of interest.
The Nyquist frequency rate is twice the highest frequency (or bandwidth) in the sampled spectra (to prevent aliasing) of baseband signals. But in practice, given finite length signals, and thus non-mathematically perfectly bandlimited signals (as well as the potential need for physically implementable non-brick-wall filters), the sampling frequency for DSP has to be higher than twice the highest signal frequency. Thus doubling the number of samples by doubling the sample rate (2X LO) would still be too low. Quadrupling the sample rate (4X LO) would put you nicely above Nyquist rate, but using that much higher frequency sample rate would be more expensive in terms of circuit components, ADC performance, DSP data rates, megaflops required, and etc.
So IQ sampling is often done with a local oscillator at (or relatively near) the same frequency as the signal or frequency band of interest, which is obviously way too low a sampling frequency (for baseband signals) according to Nyquist. One sample per cycle of sine wave could be all at the zero crossings, or all at the tops, or at any point in between. You will learn almost nothing about a sinusoidal signal so sampled. But lets call this, by itself almost useless, set of samples the I of an IQ sample set.
But how about increasing the number of samples, not by simply doubling the sample rate, but by taking an additional sample a little bit after the first one each cycle. Two samples per cycle a little bit apart would allow one to estimate the slope or derivative. If one sample was at a zero crossing the additional sample wouldn't be. So you would be far better off in figuring out the signal being sampled. Two points, plus knowledge that the signal of interest is roughly periodic at the sample rate (due to band-limiting) is usually enough to start to estimate the unknowns of a canonical sinewave equation (amplitude and phase).
But if you go too far apart with the second sample, to halfway between the first set of samples, you end up with the same problem as 2X sampling (one sample could be at a positive zero crossing, the other at a negative, telling you nothing). It's the same problem as 2X being too low a sample rate.
But somewhere between two samples of the first set (the "I" set) there's a sweet spot. Not redundant, as with sampling at the same time, and not evenly spaced (which is equivalent to doubling the sample rate), there's an offset which gives you maximum information about the signal, with the cost being an accurate delay for the additional sample instead of a much higher sample rate. Turns out that that delay is 90 degrees. That gives you a very useful "Q" set of samples, which together with the "I" set, tells you far more about a signal than either alone. Perhaps enough to demodulate AM, FM, SSB, QAM, etc., etc. while complex or IQ sampling at the carrier frequency, or very near, instead of at much higher than 2X.
Added:
An exact 90 degree offset for the second set of samples also corresponds nicely to half of the component basis vectors in an DFT. A full set is required to fully represent non-symmetric data. The more efficient FFT algorithm is very commonly used to do a lot of signal processing. Other non-IQ sampling formats might require either pre-processing of the data (e.g. adjusting for any IQ imbalance in phase or gain), or computationally expensive Hilbert transforms, or use of longer FFTs, thus potentially being less efficient for some of the filtering or demodulation commonly done in typical SDR processing of IF data.
Added:
Also note that the waterfall bandwidth of an SDR IQ signal, which might seem wide-band, is typically slightly narrower than the IQ or complex sample rate, even though the pre-complex-heterodyne center frequency might be much higher than the IQ sample rate. So the component rate (2 components per single complex or IQ sample), which is twice the IQ rate, ends up being higher than twice the bandwidth of interest, thus complying with Nyquist sampling.
Added:
You can't create the second quadrature signal yourself by simply delaying the input, because you are looking for the change between the signal and the signal 90 degrees later. And won't see any change if you use the same two values. Only if you sample at two different times, slightly offset.