Justme and Andy raised an interesting point in the comments, about the meaning of "synchronous" and "asynchronous".
I first went with a strict meaning of "Synchronous means transmitter and receiver run on the same clock, so there is no need to for clock recovery in the receiver." That covers "source synchronous" if the source transmits the clock on a dedicated clock trace/wire alongside the data, but there are other options, for example both transmitter and receiver could be clocked by something else.
With an UART, characters are sent... when they're sent. Even in a "continuous" stream of data there can be random pauses between characters which are not integer multiples of the bit duration. So the receiver must synchronize on the start bit of each character and then sample the subsequent bits according to the preset baud rate.
Now when the clock is not explicitly transmitted but instead embedded in the signal (Manchester, SPDIF, USB, etc) and has to be recovered by the receiver, I called this "asynchronous" by mistake, but it's in fact synchronous.
To add a layer of confusion, it is absolutely possible to decode slow synchronous protocols like SPDIF, USB1, Manchester, etc with an asynchronous receiver which samples the signal at a much higher frequency. This gets several bits per actual signal bit, then looks at the duration of one and zero levels and/or transition times, and decides what the data should be without having to recover any clock.
Soo... "synchronous" would in fact be a property of the transmitted signal, meaning it is synchronized with a clock of constant frequency (no possibility of random duration pauses between characters like for a UART).
async serial: UART
sync serial with separate clock line: SPI, I2C, I2S, HDMI...
sync serial with embedded clock: SATA, USB, SPDIF...
sync parallel: all synchronous RAM interfaces, DDR, most buses (PCI...), PATA, SCSI...
Having a separate clock line avoids the need for clock recovery, but clock and data must arrive at the receiver with proper timing. Synchronous parallel systems usually require all data bits to arrive at about the same time too.
The difference in arrival time between all these signals is "skew", and if there is too much skew, some of the btis will be early, some will be late, some will be in transition when they should be steady, and it makes a mess. So you get wiggly traces on your motherboard between CPU and RAM to make sure all the lines have the same length. PCI bus was the same story, but in hard mode: adding cards in your PC adds capacitance to the bus along with reflections and signal integrity issues due to tapping a transmission line, which means maximum possible skew increases the further away from the bus master you are, and on top of that it depends on what's plugged. That's why PCI never reached high clock speeds, and PATA required those special snowflake cables to run at high speeds.
In addition, if it's a bus, it'll be half-duplex, only one master is transmitting at a time. If it's RAM, either it reads, or it writes, but not both. And if it's synchronous, then you have another problem: if the length of clock and data are matched, then when the CPU sends data and clock to the RAM, it will all arrive in sync. But when the RAM sends data to the CPU, then it won't be in sync with the clock sent from the CPU due to roundtrip delay. So you get rather complex systems with PLLs and propagation delay compensation, etc. If it's parallel, the higher the bitrate, the more headache, including per-pin skew compensation, etc.
That explains the popularity of serial stuff: PCI-e can use many lanes but they're all independent, there is no main clock, it's all fully asynchronous, and skew requirements between lanes are much less of a problem.
async parallel: hmmm...
That could mean "lots of serial links like PCI-e" but that's not really "parallel".
It's a bit of a paradox to define this last one. If you're controlling a bunch of stuff with GPIOs, it could be argued this is "asynchronous parallel", with each bit having some special meaning to the receiver. However, if it's really a parallel port and the bits are transmitted asynchronously, it means going from "00" to "11" will go through a "10" or "01" state if one signal switches faster than the other. That's only usable if you don't care what happens during the transitions.