There are basically two questions here: i) why is \$\mathbf{x}\$ is considered a RV and ii) why it is said that in the presence of gaussian noise (\$\mathbf{z}\$) the capacity is maximised when \$\mathbf{x}\$ is also gaussian.
i) Firstly the scenario you have given as an example is correct, the only issue is that you are considering transmission over a noiseless channel. Lets extend your example and suppose we are transmitting a sample from the set \$\mathcal{X}\$, where \$\mathcal{X} = \{0,\Delta, ..., (A-\Delta), A\}\$. So the hypothetical transmitter picks a symbol \$x\$ uniformly from \$\mathcal{X}\$ and sends it to the receiver.
In a noisy channel, the received symbol with not be \$x\$, it will be (\$x\$ + noise). The noise is \$z\$ which is a RV chosen from the set \$\mathcal{Z} = \{0,\Delta, ..., (A-\Delta), A\}\$. If \$z\$ is also picked uniformly from this set it is clear that the capacity will be zero. In other words, there will be no way of computing \$x\$ from \$y\$, where \$y = x + z\$.
If on the other hand, \$z\$ is only uniformly distributed in the set \$\mathcal{Z} = \{0, \Delta, 2\Delta\}\$ then the channel has some capacity. This capacity can be shown to be achieved when the transmitted symbol is uniformly distributed in the subset \$\mathcal{X}_s = \{0,3\Delta, ..., (A-3\Delta), A\}\$ and zero elsewhere. In other words, we just separated the symbols in \$\mathcal{X}\$ by \$3\Delta\$ so any symbol \$x\$ we transmit is perfectly recoverable regardless despite the presence of noise \$\mathcal{Z}\$.
Our capacity per symbol is the number of bits we get per symbol received. If the number of symbols in the set \$\mathcal{X}_s\$ in which we pick samples uniformly from is \$M\$, then a \$\log_2{M}\$ bit number would be the shortest bit sequence long enough to uniquely represent each of the symbols in the set \$\mathcal{X}_s\$. Our capacity per symbol is thus \$\log_2{M}\$. \$M\$ is called the cardinality of \$\mathcal{X}_s\$ and is represented by \$|\mathcal{X}_s|\$. The value of \$\log_2{|\mathcal{X}_s|}\$ is also called the entropy of a uniformly distributed RV over the set \$\mathcal{X}_s\$. The general formula for computing the entropy of a continuous distribution with distribution \$p(x)\$ is \$h_{p(x)} = - \int p(x) \log p(x)\$.
If a transmitter has the capability to transmit a symbol from a discrete set \$\mathcal{X}\$, but it has to do that over a noisy channel which adds uniform noise from a discrete set \$\mathcal{Z}\$, then the zero error capacity per sample in this case is
$$
C_s [bits] = \max _{\mathcal{X}_s \in \mathcal{X}} \left\{ \log_2(\text{Total Number Of Samples in } \mathcal{X}_s) \right\}
$$
So you basically pick the largest subset \$\mathcal{X}_s\$ of \$\mathcal{X}\$ which also has perfectly recoverable symbols at the receiver in the presence of noise \$\mathcal{Z}\$. The distribution \$p(x)\$ over \$\mathcal{X}_s\$ that this capacity is achievable is the uniform distribution as we have discussed.
\begin{align}
C_s [bits] & = \max _{\mathcal{X}_s \in \mathcal{X}} \left\{ \log_2(\text{Total Number Of Samples in } \mathcal{X}_s) \right\} \\
& = \log |\mathcal{X}_s| \\
& = \log \frac{|\mathcal{X}|}{|\mathcal{Z}|} \\
& = \log |\mathcal{X}| - \log |\mathcal{Z}| \\
& = h(\mathcal X) - h(\mathcal Z)
\end{align}
In other words the zero-error capacity per sample is \$C_s = \max \left( h(X) - h(X | Y) \right) = \$ \$\max \left( h(\mathcal{X}) - h(\mathcal{Z}) \right) \$. This relationship can be shown to also hold in the general case and the quantity \$\left( h(X) - h(X | Y) \right)\$ is called the mutual information and is usually denoted as \$I(X;Y)\$.
Now that we have covered some basics, the question we now move onto what is the distribution \$p(x)\$ over \$X_s\$ that achieves the capacity if i) the noise \$\mathcal Z\$ is not uniformly distributed but it is instead gaussian distributed, ii) the average power per symbol to be transmitted is limited to \$P\$. The average power is \$E(X^2) = \Sigma x^2p(x) = \sigma ^2 = P\$, which implies that the variance of \$p(x)\$ is fixed at a value of \$\sigma^2\$. This then moves us to your second question.
ii) Let \$p(x)\$ be a zero mean gaussian distribution over the domain \$x \in (-\infty, +\infty)\$ with a variance of \$\sigma ^2 = P\$. We will show that there is no other distribution over the domain \$x \in (-\infty, +\infty)\$ with a fixed variance of \$\sigma ^2 = P\$ that also has a higher entropy than the gaussian distribution. This will prove that if we want to maximise \$\left( h(\mathcal{X}) - h(\mathcal{Z}) \right)\$ in \$x \in (-\infty, +\infty)\$ with a distribution with a fixed variance of \$\sigma ^2 = P\$ then there is no distribution better than the normal distribution.
Proof:
The proof relies on two other facts the first one being for any two probability distributions \$p(x)\$ and \$g(x)\$ the quantity below called the relative entropy can be shown to always greater than zero:
\begin{align}
D(g(x) || p(x)) = \int_{-\infty}^{+\infty}{g(x) \log \frac{g(x)}{p(x)}} \geq 0
\end{align}
and also if \$p(x)\$ is the gaussian and \$g(x)\$ has the same variance as \$p(x)\$ (i.e if \$\left(\int x^2 g(x) = \int x^2 p(x) \right) = \sigma^2\$ ) then
\begin{align}
\int g(x) \log p(x) = \int p(x) \log p(x)
\end{align}
To prove the equation above, we can simply evaluate the integral
\begin{align}
\int g(x) \log p(x) - \int p(x) \log p(x) &= \int (g(x) - p(x)) \log p(x)
\end{align}
Now if \$p(x)\$ is zero mean gaussian then \$p(x)\$ is of the form \$Ae^{Bx^2}\$ where \$A = 1 / \sqrt{2\pi \sigma ^2}\$ and \$B = 1/2\sigma^2\$. So \$\log p(x)\$ is of form \$\log_2 Ae^{Bx^2} = (\ln A + Bx^2)/ \ln(2) = C + Dx^2\$. Where \$C\$ and \$D\$ are constants.
Our integal is then
\begin{align}
\int (g(x) - p(x)) \log p(x) &= \int (g(x) - p(x)) (C + Dx^2) \\
&= C\int g(x) - C\int p(x) + D\int x^2 g(x) - D\int x^2p(x) \\
&= C(1) - C(1) + D\sigma^2 - D\sigma^2 \\
&= 0
\end{align}
This proves that
\begin{align}
\int g(x) \log p(x) = \int p(x) \log p(x)
\end{align}
Now going back to our first equation
\begin{align}
0 &\leq \int_{-\infty}^{+\infty}{g(x) \log \frac{g(x)}{p(x)}}\\
& = \int g(x) \log g(x) - \int g(x) \log p(x) \\
& = \int g(x) \log g(x) - \int p(x) \log p(x) \text{ (from the earlier proof)} \\
& = -h_{g(x)}(\mathcal X) + h_{p(x)}(\mathcal X) \\
\implies & h_{p(x)}(\mathcal X) \geq h_{g(x)}(\mathcal X)
\end{align}
And because the capacity per sample is \$C_s = \max \left( h(\mathcal{X}) - h(\mathcal{Z}) \right))\$, the distribution that maximises \$h(\mathcal X)\$ will maximise capacity. The proof above is then sufficient to prove that one of the distributions with variance \$\sigma^2 = P\$ that maximises the capacity over \$x \in (-\infty, +\infty)\$ is the gaussian distribution.
For a wireless channel an expression for the capacity when the noise is gaussian can then thus be calculated as shown in the answer HERE.