Most generally: a real component has an impedance that varies up and down, dependent on frequency.
When that impedance is flat for much of the range, we call it a resistor.
When that impedance is sloped upward for much of the range, we call it an inductor.
When that impedance is sloped downward for much of the range, we call it a capacitor.
Real components don't hold that slope forever, nor perfectly (i.e., slightly other than frequency raised to a power of -1, 0 or 1). To model these errors, we add other elements.
The elements can be solved in an algorithmic manner. Since RLC (lumped element) circuits have rational polynomial characteristics, we use Padé approximants to fit a circuit to the measured curve.
The degree of accuracy we wish to fit that model, determines the order of the approximation: how many components we use to fit it.
Which in turn is determined by our need for the model. If we're just sketching something out, the ideal element (pure R, L or C) may do. If we're budgeting power losses in a converter, say, we may need the RL or RC model; if switching harmonics are involved, we might need it valid over several decades of frequency, in which case still more elements may be required.
The models shown above are indeed sufficient for many practical purposes, so they're worth documenting as such. They are certainly not exhaustive models; indeed, the corresponding diagrams suggest you can subdivide these components ultimately into infinite elements to finally get exact accuracy. Somewhere between one and infinity, there is likely sufficient accuracy for a given application.
As for your particular question: why series or parallel? I'll answer this with my own question: so what if there was a parallel inductor [to the capacitor]? What would happen at DC, then? Would its impedance still approach infinity (or R_L)?
A more subtle question we might ask, is: how much (or little) inductance could be in parallel, that could still be within experimental error -- because, after all, this is not just an approximation exercise, but subject to noisy measurements as well?
Or more general still: what series RLC combination (or further combinations in turn, recursively for each element) could be in parallel with the capacitor, with what value inductor(s), and still result in the observed impedance curve?
A simple calculation shows that, for a capacitor to exhibit say 1GΩ leakage, as measured within 1 hour, and measuring the same value for three months: the minimum possible inductance in parallel is of the order \$ L \ge \frac{(1\,\textrm{G}\Omega) (7.776 \,\textrm{Ms}) }{ 2 \pi}\$. If it were less than this, the impedance would be decreasing noticeably over those months.
Likewise, for inductance in series with R_L, for it to stabilize within an hour, \$ L \le \frac{(1\,\textrm{G}\Omega) (3600 \,\textrm{s}) }{ 2 \pi}\$. Still an impossibly large value -- and at that, negligibly small in comparison with our 1GΩ resistor, so we have no reason to bother including it in our model, and scratch it out as irrelevant.
Physically realistic values aren't everything, but we do prefer them. There is no physical mechanism for parallel inductance in a capacitor, so we have no reason to try and put it in our model.
That's not to say we should never use them. Consider the quartz crystal resonator. This is a piezoelectric crystal between two metal plates, which therefore have some capacitance to each other (and their surroundings), but due to the piezoelectric effect of the quartz (or other) material, there is an exchange between electrical and mechanical energy. Voltages applied to the crystal are reflected as mechanical vibrations, and vice versa.
If nothing's [separately] coming in as mechanical vibration, then we're only concerned with the effect of those internal vibrations on, and by, the electric field: we can draw a 2 or 3-terminal component, with a dominant capacitance between terminals, and model the effect of the electric field on the mechanical vibration on the electric field. When we do this, we find the capacitor has a resonant network in parallel with it, having a rather high impedance: this reflects the mechanical resonance of the crystal itself, as sensed by electromechanical effect.
And as it turns out, these resonances can be extremely sharp -- meaning both that, the frequency range over which they act is very narrow, and that they couple to the electric circuit surprisingly well despite the otherwise fairly weak piezo effect (the crystal is only moving, perhaps some nanometers, at normal signal levels). So we end up with, say a 4MHz, ESR=100Ω crystal, having motional inductance and capacitance of ~kH and ~fF. These are clearly nonphysical values -- you can't actually wind a 1kH inductor, at least not for anywhere near 4MHz, and the thing doesn't seem to contain a coil structure at all. But the model is nonetheless a good fit for the measurements, and so we accept this model as electrically correct, and just smile and nod at the otherwise-ridiculous numbers within.