Your two questions "why the 2.58V midpoint" and "how to choose capacitances" are probably related.
It takes time for capacitors to charge up to their steady-state DC voltages. The time it takes is related to a "time constant", \$\tau\$, which is the product of the capacitance and the total resistance they are in series with. This steady state is called the "DC operating point" or "quiescent state".
The time constant of C? and R3 is \$\tau = R_3C_? = 100k\Omega \times 220\mu F = 22s\$. It will take about 5τ (>100s) for the DC offset voltage across C? to settle at its target quiescent level of 2V. I think perhaps you just didn't wait long enough for it to arrive there.
This pair C? and R3 form a high-pass filter with cut-off frequency \$\frac{1}{2\pi R_3C_?} = \text{7mHz}\$. That's ridiculously low, considering that you are amplifiying audio, with nothing under 20Hz of any interest. Assuming you want to keep R3 at 100kΩ, and a more reasonable cut-off frequency of, say, 10Hz, an appropriate value for C? will be:
$$ C_? = \frac{1}{2\pi R_3f} = \frac{1}{2\pi \times 100k\Omega \times 10Hz} = 160nF $$
This value for C? will also change the charging rate, having a new time constant of:
$$ \tau = R_3C_? = 100k\Omega \times 160nF = 16ms $$
Now it will take only about \$5 \times 16ms = 80ms\$ for C? to charge to its DC operating point, with 2V average across it.
The purpose of C10 is to attenuate high frequency (noise) components of the power supply, so they don't appear on the 2V bias signal. By drawing the Thevenin equivalent of the 5V supply and resistors R11 and R12, you can see how this works:

simulate this circuit – Schematic created using CircuitLab
Now it's clear that you have a low-pass filter, with cut-off frequency \$f = \frac{1}{2\pi R_{TH}C_{10}} = 1.2kHz\$. Here's where you could do with a bigger capacitor, to get that \$f\$ down closer to (or beyond) the lower end of the audio frequency spectrum, 20Hz. Let's set our sights on 10Hz, as we did for C?:
$$ C_{10} = \frac{1}{2\pi R_{TH}f} = \frac{1}{2\pi \times 1.3k\Omega \times 10Hz} = 12\mu F $$
With C10 at 12μF, C10 and Rth will have the same time constant of 16ms, and will also settle to their DC quiescent level after 80ms or so.
Lastly, the output capacitor C8 and load R7 are a high-pass arrangement, and since you want to pass everything above 10Hz, the procedure to find the right capacitance is the same:
$$ C_8 = \frac{1}{2\pi R_7f} = \frac{1}{2\pi \times 10k\Omega \times 10Hz} = 1.6\mu F $$
The formula given by the datasheet might make more sense if you re-arrange it a bit:
$$
\begin{aligned}
V_O &= (V_{REF}-V_I)\frac{R_4}{R_2} + V_{REF} \\ \\
&= V_{REF}\frac{R_4}{R_2} - V_I\frac{R_4}{R_2} + V_{REF} \\ \\
&= \overbrace{\left( 1 + \frac{R_4}{R_2} \right)}^{\text{non-inverting gain}}V_{REF} + \overbrace{\left(-\frac{R_4}{R_2}\right)}^{\text{inv. gain}}V_I \\ \\
\end{aligned}
$$
You should recognise the left gain term \$1+\frac{R_4}{R_2}\$ as the classic non-inverting amplifier gain expression. It applies to \$V_{REF}\$ since \$V_{REF}\$ is connected to the non-inverting input, and any changes there will result in an output change in the same direction.
The right gain term \$-\frac{R_4}{R_2}\$ applies to input signal \$V_I\$. This time it's the classic inverting amplifier gain expression (note the negative sign, indicating an inversion).
Also, in this form, it's clearer what will happen with your own circuit. If I call the impedance of the capacitor C? \$Z_C\$, you can see that the datasheet's R2 is the combined impedance of R3 and C? in series, in your circuit. Their R4 is your RV1. We can plug these into that equation:
$$
\begin{aligned}
V_O &= \left( 1 + \frac{RV_1}{R_3+Z_C} \right)V_{REF} + \left(-\frac{RV_1}{R_3+Z_C}\right)V_I \\ \\
\end{aligned}
$$
There are two conditions to consider. The first is DC, where we are dealing with a frequency of 0Hz, and capacitors have infinite impedance, so \$Z_C\rightarrow \infty\$. The second is AC, at frequency, where capacitances have low impedance, \$Z_C \rightarrow 0\$.
At DC, with \$Z_C\rightarrow \infty\$, the expression becomes:
$$
\begin{aligned}
V_{O(DC)} &= \left( 1 + \frac{RV_1}{R_3+\infty} \right)V_{REF} + \left(-\frac{RV_1}{R_3+\infty}\right)V_I \\ \\
&= \left( 1 + 0 \right)V_{REF} + \left(-0\right)V_I \\ \\
&= V_{REF}
\end{aligned}
$$
For frequencies in the pass band, where \$Z_C \rightarrow 0\$:
$$
\begin{aligned}
V_{O(AC)} &= \left( 1 + \frac{RV_1}{R_3+0} \right)V_{REF} + \left(-\frac{RV_1}{R_3+0}\right)V_I \\ \\
&= \left( 1 + \frac{RV_1}{R_3} \right)V_{REF} + \left(-\frac{RV_1}{R_3}\right)V_I \\ \\
\end{aligned}
$$
Since \$V_{REF}\$ is fixed and constant, a DC potential with no components above 0Hz, it can be disregarded at AC, leaving you with:
$$
\begin{aligned}
V_{O(AC)} &= -\frac{RV_1}{R_3}V_I \\ \\
\end{aligned}
$$
In other words, there will be a DC offset of \$V_{REF}\$ at the output, and AC components of the input signal will be inverted, and amplified by a factor of \$\frac{RV_1}{R_3}\$.