Apologies for the delay, posting here now the answer developed in-house (by myself) that was accepted based on LTspice sims, for implementation at a later date. Here is the overall circuit below, made of "jelly-bean" parts:

Brief description of the circuit:
The input voltage is converted to a current source (10V peak -> 500uA peak), this current flows into a two-transistor common-base stage (Q1_412, Q2_412) that directs the current into separate collector circuits based on polarity. The lower output (Q2) is used for the output since it is referenced to 0V (gnd). The upper output (Q1) is not used, but could be directed to the output (Q2 collector) via an additional current mirror to form a full-wave precision rectifier if so desired (not required in my case). Image below shows input signal and Q1, Q2, please ignore R276:

The collector current of Q2 is converted to a voltage by resistor [Rgain_412], the value of which is set to the ideal value times a correction factor; viz:
(20k * 1.571) being the ideal value to convert 10V peak half-sine (500uA peak) to 5Vdc average;
and 1.019 being the correction factor to trim the output to 5.00V+/-20mV at 100% input due to the non-ideal transfer function of the circuit at the AM carrier of 1MHz. This is then filtered by a two-stage RC filter to get the desired ripple and step response.
(Aside: this correction factor suggests the gain reduction at carrier of 1MHz is just 2%.)
Image below shows the output filter:

Sources of Error, and Mitigation of Same:
There are two major sources of error (or non-linearity, ie: the degree to which currents in Q1 & Q2 collectors deviate from the input signal current).
The first is the AC voltage at the input of the common-base stage. Ideally this should be zero to keep signal current directly proportional to signal voltage, but in reality will vary with Vbe of each transistor Q1 & Q2 as each is exercised over the full range of the carrier signal. Note that collector current of both Q1 & Q2 will vary by 3 to 4 decades as signal amplitude varies from 0 to 100%, so we would expect Vbe for Q1 & Q2 to deviate by at least 180mV to 240mV (using the "rule of thumb for small-signal BJTs": \$\Delta\$Vbe=60mV per decade of collector current).
The error is dramatically reduced by a simple error amplifier formed by Q5, and fed by Q3 & Q4. This error amp also sets up the bias conditions. In this case, the emitter of Q2 is biased to about 8V so that Q2 collector has plenty of headroom available for the output voltage required.
Image below shows the error amplifier:

The second source of error is the quiescent Vbe bias of Q1 & Q2, which is set by Qbias_412 configured as a Vbe multiplier. It was found that not having any bias (Q1 & Q2 bases tied together) caused what appeared to be "reverse recovery" current to flow between Q1 & Q2 at the zero-crossings of the input, which distorted the current in the collectors of Q1 & Q2. This "reverse recovery" current turned out to be base current flowing in the reverse direction of the transistor that was turning off, and flowing into the emitter of the incoming transistor. According to the simulation, this was due to the charging & discharging of the B-E junction capacitance during the zero-crossing. Adding a bias voltage reduced the swing of Vbe thus reducing this "reverse recovery" current.
However, there is a trade-off: increasing Vbe bias sets up a quiescent collector current, which then sets up a significant output offset voltage at zero signal, and a dead zone where the output does not respond to the input signal. The bias was adjusted such the reverse recovery current was largely eliminated, and the Vbe swing for both Q1 & Q2 was reduced to just 250mVp-p for all input signals including 100% (reduced from about 1.22Vp-p swing in the case for no bias). This reduced swing also made the life of the error amp easier, since its output swing (Q5 collector voltage) reduced from 1.22Vp-p to just 0.25Vp-p. Refer image below for the bias circuit.

Simulation Results for AM Carrier of 1MHz:
Here are the results of the solution for a carrier of 1MHz, where 100% signal is 10V peak, and measured (by simulation) at 0%, 1%, 2%, 5% 10%, 20%, 50% and 100%. The circuit was trimmed by adjusting the correction factor for [Rgain_412] to 1.019 to give an output of 5.00V+/-20mV for 100% signal (10V peak).
Image below: output (vertical) vs peak of input (horizontal).

Image below: zoom of previous near zero.

Image below: error compared to ideal. We can see that the error from ideal stays within 0.5% for signals above 10%, and within 1% for signals down to 5%. The loss of accuracy at low signal was deemed acceptable for the application.

Image below: The waveforms of Q2 Vbe (brown) and Ib (green) at 100% signal, cyan trace is for the case where bias is zero volts.

Image below: Waveforms showing "reverse-recovery" effect.
The "reverse-recovery" effect is clearly visible at the start of the pulse when Q2_Ic first begins to rise from 0, during this period Q1 supplies the current for the signal source thus robbing it from the input to Q2. Q2 returns the favour at the end of the rectified pulse when its collector current falls back to 0.
Upper: Signal current (Rsig_412), & Q2 Ic for (a) biased case (Q2_412) and (b) non-biased case (Q2_112).
Lower: Q2 base current for biased case (Q2_412) and non-biased case (Q2_112). Large current spikes at the zero-crossings have been reduced by the Vbe bias.

Image below: response to a step pulse (0% to 100% to 0%).
Upper: input to AM modulator.
Middle: ideal detector response (with same filter), and circuit response.
Lower: error compared to ideal.

Image below: ripple at 100% input, ripple is 60mV p-p.

Alternative Circuit:
The circuit below is an alternative with almost identical performance, the main change is that the error amplifier is now implemented as a differential pair. C97 & C158 around Q3_414 can be ignored, these are placeholders for bandwidth limiting caps in case they are needed to ensure stability for the real-world production circuit.
Compared to the previous circuit:
- Total number of transistors is unchanged.
- The bias network for node [EA+_414] was changed so that Q2_414 emitter voltage is the same as for the previous circuit (~8V).
- Bias current for the diff pair is set by [Rdiffbias_414] and was selected so both circuits present the same current drain on the +12V supply at no signal.
- Resistor [RQ3c_414] was selected so collector currents of the diff pair were balanced.
- Gain resistor [Rgain_414] needed a slight adjustment to give the required output of 5.00V+/-20mV at 100% signal.
- Gain of this new error amp is about half that of previous error amp (voltage across its inputs is 26mVp-p vs 11mVp-p for previous error amp, for same Vout of ~250mVp-p), but this has a negligible effect on overall performance of the detector.
Image below shows the complete alternative circuit:

Image below shows the error amplifier section:

Images below show the error amplifier input voltage (EA- wrt EA+) at 100% signal (just before the modulating signal drops to 0) for both circuits:


Edit 15-Aug-2023: Frequency Response
Just for completeness, adding here the sim results showing the output of the peak detector as the carrier frequency is increased. The output is down by 10% from ideal at F=5.5MHz. The results suggest this peak detector is usable with input signals up to about 5MHz, depending on acceptable levels of frequency-dependant error.
If the input frequency is kept within a narrow range, then a compensation factor can be applied to the output (via [Rgain]), extending the usable frequency out to 10MHz or possibly even 20MHz; but input sensitivity may be affected.
Image below shows the response as input frequency is increased. The input to the peak detector is a 10Vpeak sinewave generated by a VCO, the frequency starts at ~10kHz and starts to ramp up from T=100us.
Upper chart: VCO input, 1V==1MHz.
Lower chart: purple trace=output (ignore red and green).
