How can I be sure that a multi-bit-per-symbol encoding schema exists?

Question

(I came up with this question when I try to understand bit rate and baud rate.)

Suppose I have some data to transfer. And the data is binary encoded as a data amount of N bits.

If I use 2 symbols to represent the binary data, which means one symbol for 0 and another symbol for 1, then I can only transfer 1 bit of data a time. And the effective bit rate is the same as baud rate.

If I use more than 2 symbols to represent the binary data, and each symbol represents multiple bits, then I can transfer the same N bits of effective data much faster. And the symbol change rate on the line (baud rate) is lower than the effective bit rate.

But how can I know such a multiple-bits-per-symbol encoding schema exists for a given data trunk?

ADD 1

I once had some difficulty in understanding the baud rate and bit rate. I think the difficulty comes from the first impression I got from pictures similar to below one:

The pic gives me an impression that what get physically transferred over the wire is the digit 0 and 1. And for each digit, a different voltage level is assigned. So there are always only 2 different signal/symbol/voltage types on the wire. And thus the bit rate is always the same as baud rate.

Now I think this pic just shows some effective result. It is symbol that is get transferred over the wire. The amount of possible symbols is determined by the physical nature of the channel/medium. When a symbol is transferred, the one or more bits it carries get transferred effectively. And how symbols represent bits is a mathematical agreement among the communication parties.

What type of data trunk? A very popular method is QAM, which is defined to have more than one bit per symbol. — Peter Smith, Mar 25 '17 at 15:03
What do you mean by "a given data trunk"? What would you consider an example of a "data trunk"? — The Photon, Mar 25 '17 at 15:30
@ThePhoton I mean an arbitrary piece of data, which is a stream of bit 0/1. — smwikipedia, Sep 18 '20 at 05:52

Claudio Avi Chami · Answer 1 · 2017-03-25T15:16:17.753

There are three elements of a carrier that you can modulate: the amplitude, the phase and the frequency.

A very popular digital modulation scheme uses one of four possible phases (QPSK). So it can convey two bits on each symbol.

Other often used digital modulation schemes use combinations of several amplitude and phases. For example, 16QAM can send one between 16 possible combinations of phase and amplitude. So each 16QAM symbol can convey four bits.

There are other digital modulation variations similar to those mentioned, like 8PSK or 64QAM, 256QAM, etc.

To be able to decode a multi-bit symbol you need rather complex receivers. So those multi-bit per symbol protocols need mechanisms for data synchronization, you have to analyze the path to know if the SNR is enough to differentiate each symbol, etc.

This is it really in a nutshell, I hope that it is clear as an introduction.

Thanks, I am learning EE and your answer is a good introduction. — smwikipedia, Mar 25 '17 at 15:19
You are welcome! Best of luck on your studies. I also learned EE once ago and real life examples often help us to grab new concepts. Don't hesitate to ask again and cheers, you have chosen a great career! — Claudio Avi Chami, Mar 25 '17 at 15:59

score 2 · Answer 2 · answered Mar 25 '17 at 16:25

The number of bits per symbol does not need to be an integer, although this is most convenient (simplest to implement).

In the most general case, you treat your entire message as one big binary (base 2) number. If your channel has N states (symbols), you simply convert that number to base N and transmit it one digit at a time.

In a more practical implementation, you would break the message into fixed-length blocks, converting and transmitting one block at a time, possibly adding additional error detection and correction bits to each block.

So the answer to your fundamental question is that there is always a way to transmit digital data using an arbitrary number of symbols.

So, if I don't consider additional overhead bits, on a binary channel (I mean a channel with only 2 states/symbols), the baud rate is always the same as bit rate. Right? — smwikipedia, Mar 26 '17 at 01:34

neonzeon · Answer 3 · 2017-03-25T20:38:25.577

The other answers probably answered your title request. But as a student and young engineer I struggled with symbol-vs-bit concepts.

If I use more than 2 symbols to represent the binary data, and each symbol represents multiple bits, then I can transfer the same N bits of effective data much faster.

Here is some insight that took me a long time to grasp well.

Listen to Claude Shannon himself at 5:12.

The fundamental answer is NOISE. The universe vibrates, and there is a "base noise level" that we simply cannot avoid in any electronic system.

Noise is unwanted energy from natural (and man-made) sources. Every resistor and every active component in your circuit(s) contribute to this unwanted energy in your communication channel.

Every symbol in your encoding scheme has a specific signal energy (measured in Joule) related to both the power and duration of the encoded symbol that competes with the unwanted energy (noise) in the same time slot.

If you encode only one bit per symbol, all the energy in that symbol represents that single bit. But, if you encode N states (log2(N) bits) onto each symbol, each bit effectively gets only a portion of the energy of the symbol.

On the other hand, the noise energy of each bit in a symbol does not divide. This is the key point to grasp. One way to look at it is that all the noise energy in each symbol do battle with that single lonely bit that is encoded onto that specific symbol. Think about this carefully - bit energy divides, noise energy does not.

So, as you encode more bits onto each symbol, you effectively lower the ratio of energy-per-bit/noise-energy-per-bit.

Ultimately, due only to the presence of noise energy, each bandwidth-limited communication channel has a theoretical upper bit rate limit that is solely a function of the unwanted energy (noise and interference) in that channel.

To paraphrase: If it was not for noise, we would enjoy unlimited data rates on every single communication channel.

From the above, one might intuitively feel that it's always better to choose one symbol per bit, because then all the signal energy in that symbol can be assigned to battle the noise energy in that symbol.

This is not the case...

In fact, the opposite is true - simply because, by encoding more bits onto each symbol you effectively allow the symbol duration to be longer, and therefore the energy-per-bit decays slower than the noise-energy-per-bit until the limit is reached. This goes back to the fundamental insight that the signal energy in each symbol does battle with the noise energy in that symbol.

Consequently, modern encoding schemes encode multiple bits onto each symbol, resulting in an effective symbol duration that is much longer than a single bit duration.

The downside of more bits-per-symbol is the additional processing power and complexity required for both encoding and decoding of the bits.

The benefit of more complex encoding is the amazing high speed internet channels we daily use and enjoy at work, in our homes and on our phones.

Also don't forget GPS and deep space and Viterbi!

score 2 · Answer 4 · answered Mar 26 '17 at 03:39

What you need to understand is that you can't just put "0"s and "1"s in the line. You have to encode it somehow and that's called modulation, which is part of the physical layer of any protocol.

So, you have a copper wire, or an optical fiber, or even and electromagnetic field and you have to somehow transmit bits to the other side. There are many ways to do that, but the basics apply: you usually have an actual physical quantity that can be measured in the other side, respectively for our cases: voltage level (or current), brightness (for each light wavelenght) and electromagnetic power.

In the transmitter side, you have to "translate" bits to those physical quantities. Note, however, that the ones I mentioned are continuous quantities: you can "put" 0, 0.5, 1, 5, 20 volts between a pair of wires. The receiver side will see those quantities in the other side of the wire pair (plus losses, interferences, noise...).

Anyway, think like this: if those quantities are continuous, I can divide it to mean more discrete states. Then, if 0 volts means the 0 bit and 1 volt means the 1 bit, I can get 0 volts to mean the bits 00, then 0.33 volts to mean the bits 01, then 0.67 volts to mean the bits 10 and 1 volt to mean bits 11. This way a single symbol, which is a single voltage measurement, can mean multiple bits. If you transmit 1 voltage level every 1/1000 of a second, you are transmitting 1000 symbols/s (baudrate) and 2000 bits/s (bitrate). If you want, you can keep dividing further, up to the point that your receiver will be confused by the noise and demodulate your bits with errors (Shannon limit).

The image above, for example, has a carrier and is called Amplitude Shift Keying (ASK) and is the digital equivalent of AM (like the AM radio), but there are many others like FSK, PSK, QAM, PWM, and many others.

score 1 · Answer 5 · answered Mar 25 '17 at 15:12

But how can I know such a multiple-bits-per-symbol encoding schema exists for a given data trunk?

By reading the protocol specification. This really should be obvious.

Note that you need to know a lot more about a protocol to actually communicate than just the effective bit rate. There is issues of encoding, knowing when words, packets, etc stop and end, how the bit are encoded, packet wrapping, and many more. All of these should be spelled out in a protocol specification somewhere.

How can I be sure that a multi-bit-per-symbol encoding schema exists?

ADD 1

5 Answers5