Clearly, the crystal only sees whatever total it does; if Cstray is from one terminal to GND, then both strays act in series, and Cs/2 must be used.
The RP2040 doc doesn't specify how their Cs is placed/measured. Indeed, given the use -- they add 5pF straight to the total -- we can assume they mean 10pF from each side to GND. Which, maybe that's a little on the high side, maybe it's about right for the pin capacitance, who knows.
There is one catch: note they're using a series resistor to set drive level (presumably the output pin is a full-level logic out; offhand, I don't see if this is defined anywhere), so that pin's capacitance does not participate, it's a low impedance to begin with (i.e. normal-strength logic drive is on the order of 40-70 ohms) and the resistor swamps its impedance. Cs on the XOUT side, then, is only the traces in the area -- which may be very small indeed, under 1pF say.
And, if the capacitances aren't symmetric, then a slightly different formula must be used -- take the sum capacitance on each side, and use the "parallel" formula (by which I mean: the values of capacitors connected in series uses the parallel-resistor formula) to get the total. (That is, Cser = (Ca Cb) / (Ca + Cb). When Ca = Cb, this reduces to a factor of 1/2.)
As you've noticed, these things are loose enough that they tend to Just Work(TM), even when things are a bit out. The main downsides are: potentially unreliable oscillation (over temperature range and manufacturing spread), and frequency slightly off (e.g. you might want a trimmer on a very accurate RTC; or better(?) yet, fix up the timing in software).
Ideally, one evaluates a crystal oscillator by obtaining a representative sample of the manufacturing range of the device in question (close to process min/max; the manufacturer may be able to provide samples for this purpose), and measuring the crystal power level for both manufacturing and temperature extremes. Include power-up conditions (not just leaving it running while varying temperature) to be sure it's able to start up. And anything else that might contribute, like variation of your supply voltage, other heat sources in your system, etc.
Most of the time, this level of scrutiny isn't worth it, and product testing during/after assembly, or field reports of marginal operation, provide a loose indication of stability.