What do you mean you "know that ID-gm relationship is nonlinear"?
In a MOSFET model, typically used for back-of-the-envelop calculations,
$$
I_D = \frac {\mu _nC_{ox}}{2}{\frac {W}{L}}\left(V_{GS}-V_{th}\right)^{2}\left(1 + \lambda (V_{DS}-V_{DSsat})\right)
$$
, and the transconductance is \$g_m = {{\partial I_D}/{\partial V_{GS}}} = {{2I_D}/(V_{GS}-V_{th})}\$. In this model and while in saturation, \$I_D\$ is a linear function of \$g_m\$, \$I_D = {\frac 1 2} {g_m \left(V_{GS}-V_{th}\right)}\$. Do you discuss a flawed design in which the diffpair and current mirror transistors work out of saturation most of the time, or are you talking about some advanced transistor models?
Nonlinearity arises because \$g_m\$ itself is a function of overdrive voltage (nonlinearity is required in a circuit which is used to multiply signals)
$$
g_m = \mu _nC_{ox}{\frac {W}{L}}\left(V_{GS}-V_{th}\right)\left(1 + \lambda (V_{DS} -V_{DSsat})\right)
$$.
The MOSFET is a square law device, the current is proportional to a squared overdrive voltage. Let \$I_0\$ be the current through each branch of the balanced diffpair (\$V_{in}=0\$). With non-zero \$V_{in}\$, the difference of differential pair's drain currents \$i_{out}\$ can be derived from the MOSFET I-V characteristic, you can find it in textbooks or course notes or better do it yourself. To cut the long story short, I write down the formula, neglecting the channel length modulation effect (\$\lambda = 0\$):
$$
i_{out} = \sqrt{2\beta I_0}V_{in}{\sqrt{1 - {\frac {\beta V_{in}^2} {8 I_0}}}}
\tag{diffpair I-V}
$$
where \$\beta = {\frac {\mu _nC_{ox}}{2}}{\frac {W}{L}}\$. Notice that in your circuit \$i_{out}\$ is translated to an \$V_{outv}\$ output voltage by means of the \$R_a\$ loads. Notice also that the formula is applicable (and required) only when transistors are in saturation, where the radicand is positive; the extension of the formula into all possible \$V_{in}\$ values embraces triod and cutoff operation modes and the extended \$i_{out}\$ is a monotonic function of \$V_{in}\$.
It differs from the bjt diffpair's \$\displaystyle {tanh}\$ curve, but both curves (for MOSFET and for bjt) are nonlinear at large \$V_{in}\$. For MOSFET, the criterion of linearity is \${\beta V_{in}^2} \ll {8 I_0}\$. You can also express \$I_0\$ via the device voltages according to a formula for saturation current \$I_D\$.
I write down the diffpair I-V equation once gain, this time with the voltages and currents at your M5, M6 diffpair
$$
i_{SS} = \sqrt{\beta I_{SS}}V_{ctrl}{\sqrt{1 - {\frac {\beta V_{ctrl}^2} {4 I_{SS}}}}}
\tag{gain control diffpair I-V}
$$
\$I_{SS}\$ is a tail current through M5,M6 transistors, it corresponds to \$2I_0\$ variable of the equation for M1,M2 or M3,M4 pairs; \$i_{SS}\$ is the difference of M5 and M6 drain currents (corresponds to \$i_{out}\$ of the \$\text {diffpair I-V}\$).
You do not specify the parameters of your circuit, but from eqs. \$\text {gain control diffpair I-V}\$ and \$\text {diffpair I-V}\$ one can see that for any diff pair the \$\text {I-V}\$ linearity is controlled by the tail current. For a M5, M6 diffpair this tail current, \$I_{SS}\$, is set by a \$V_b\$ bias voltage. So, for your Gilbert cell with a variable gain network, you have two control inputs: \$V_{ctrl}\$ sets the gain ratio for M1,M2 vs. M3,M4 branches, and \$V_b\$ controls a linearity of \$V_{ctrl}\$ input. And, to be sure, there are also diff signal inputs \$V_{in+}, V_{in-}\$ in your circuit. In the Gilbert cell-based RF mixer, the \$V_{ctrl}\$ input is used on a par with the \$V_{in}\$ input as inputs for signals to be multiplied.
Summing up, the formula \$\text {diffpair I-V}\$ gives you a quantitative tool that you can use to compute and adjust the linearity of diffpair's I-V characteristics.