I offer an additional reason to the others given (of established and extremely cross-compatible standards, mature affordable low-power chipsets, latency etc)... one of signal robustness. An analogue video signal can be quite badly degraded before it can no longer be received in some form, or understood to a sufficient extent for flight control (or review) by a human observer. The colour can drop out, the audio become completely meaningless, the visual noise level can increase to the point where there's barely a 2:1 contrast ratio of meaningful signal versus "snow", and the sync can be disrupted quite badly causing the image to waver up and down with each frame - and side to side with each line - and so long as the carrier hasn't dropped entirely you can still generally make some sense out of the feed. It's even possible that the video will keep sending in a recognisable form despite the control signal itself having failed.
Whereas digital video... well, anyone who was witness to the early days of digital OTA broadcasting (or even if you live in an area that still has a weak signal) can tell you how rapidly the signal becomes completely useless once the strength and quality falls below a certain not particularly low level, often without very much warning. In contrast to the gradual "graceful" degradation of analogue TV, the difference between a crystal clear and perfectly watchable digital signal, and one that is either so corrupted as to make no sense at all to a viewer (the encoding scheme means any random disruption to the bitstream can cause massive, unpredictable changes to the decoded image) or to completely fail to decode at all can be quite the knife-edge. Not really a characteristic that you want to have to deal with if flying FPV mode.
In other words, the analogue version will give you plenty of warning in terms of gradually degraded signal if you're coming to the edge of its usable range, so you can turn back in time. A digital one may pass over that knife edge in a couple of seconds, going from clear and smooth-running to jerky and broken up to leaving you flying completely blind in less time than you can react to the failing signal and turn the drone around. The analogue signal may be inherently lower resolution and noisier than the digital one, even at the point of takeoff, but the trade-off may prove to be very much worth losing absolute maximum video quality.
Another part is interference from the motor and control electronics and radios themselves. There's often visible artefacting on the video signal from the other electrical parts of the drone as it's running. That sort of thing just causes some visual noise and sync distortion on an analogue video signal... but it could cause a digital one to fail entirely.
Of course, you can add quite a lot of error correction encoding to the digital version to try and proof it against both of these issues, but that both reduces the available bandwidth for the actual signal (so limiting the possible resolution, as well as compression ratio, and even framerate), and adds additional latency (already a problem even without the EC) as typical FEC relies on spreading data out over a longer timespan through the continual data stream.
On top of which, most sufficiently efficient digital video codecs rely quite heavily on interframe / temporal redundancy, delta and motion-compensation techniques to greatly reduce the amount of data that has to be transmitted for a given perceptual quality, especially for low-motion content which is the largest part of most TV and recorded-video content. These of course add to the latency in the live video path (it's not really a problem for TV transmission where a little delay is accepted, or for stored/streamed video where the data is rapidly pre-buffered before decoding starts, but is deadly for remote control/presence applications)... and also mean that high-motion content, which is a rather large part of FPV footage, places an unusually high load on the data stream and compression engine, and something has to give - either the data rate gets sent through the roof (which is a problem for power consumption, chipset complexity, range and reliability of signal reception...), or the visual quality suffers, with smearing, macroblocking, interframe trails/ghosts, etc. Again, probably not something you want interfering with your vision whilst remotely piloting a fragile aircraft at speed, and the nature of the corruption would likely mean a greater loss of useful visual information than the more regular low-amplitude snow of a weak analogue signal.
Of course, low latency video codecs used for thin clients and remote-server gaming services show that it's possible to have reasonably high efficiency compression without lag sufficient to interfere with twitch reactions, but those belong to an entirely different electronic world in the main. The data links are typically wired end-to-end, or at least up until the last few metres, using transceivers that are either wired to the mains or using fairly heavy batteries that aren't being relied on to carry the machine through the air. The decoder is one small part of a larger piece of non-trivially powerful computing hardware, and more relevantly the encoder (much like that of a digital broadcaster would have been 20 years ago) is particularly beefy in order to capture the hi-def image, crunch it and fire it off down the network link as fast as possible. With the benefit of a few more years it might be possible to build a similarly capable encoder into the lightweight, low power consumption control and video sender board of an FPV drone, as well as a data stream transmitter that can sustain a high enough bitrate over a meaningful distance... but at the moment there's a heck of a difference between what can be built into small flying machine and what goes in a high-end server in a datacentre.
For the time being, a simplistic SDTV encoder/transmitter, that takes up about as much space (and just as crucially, weight) as a match from a nightclub matchbook (the head being the chip and the stick being the antenna) and consumes hardly any power beyond maybe a couple dozen milliwatts for the transmission itself, can feed a low (but still sufficiently "high") resolution, smooth framerate, effectively zero latency image from the similarly dinky burner-phone derived sensor (both parts being ludicrously cheap) to the FPV headset or handset with enough fidelity to be useful over several hundred metres. That's where the state of the art has got us, and when you think about it, it's already pretty impressive, much the same as being able to stream a hi-def recording to Youtube or Twitch with a couple seconds' latency from a handheld computer is. It just needs to advance a bit further to give a better quality image using digital transmission with suitably low latency and high reliability...