The 2 comes from the need to avoid aliasing. Note that \$log_2(L)\$ is the sample size, and if we divide the bit rate by sample size we get the sample rate: \$\frac{2B\log_2(L)}{\log_2(L)}= 2B\$.
The formula says that bandwidth B needs a sample rate of at least 2B. You're disagreeing and saying that it just needs a B sample rate.
But the bandwidth is B, then it means that the highest frequency sinusoidal in the band has frequency B. If you sample such a signal with a sample rate of B, then you will not capture the peaks and valleys of the sinusoidal. The data will in fact look flat, like a DC signal! This is because you're taking a sample of the waveform only once for each period, at the same spot in its phase.
And if you sample a frequency which is just a little bit less than \$B\$, say \$B - \epsilon\$ then your sample will look like it has frequency \$\epsilon\$, and not \$B - \epsilon\$. For instance a 9700 Hz waveform sampled at 10 kHz will produce data whose most obvious interpretation is that it is a 300 Hz waveform.
This is the same effect which allows us to use a strobe light to view the vibration or rotation of a machine as if it were in slow motion. Or why, in a film, the wheels of a moving car sometimes appear to rotate slower than the forward motion, or even backwards. It is a form of aliasing.
Aliasing means that multiple original signals produce the same sampled data, so that it is ambiguous. There is always some kind of aliasing in a sampled signal, due to the use of discrete variables to represent continuous quantities. Because the amplitude is quantized, there is quantization noise: a continuous range of levels of the original signal are represented by ("alias to") the same value. Some kinds of aliasing is particularly bad, like when reconstruction synthesizes loud signal frequencies which were not in the original input, and which are in the same frequency band as the signal of interest.
You might be confusing digital bandwidth (like "2 Mbit/sec") with analog bandwidth (like "0 to 2 Mhz").