How does BICM work?

Question

Can someone explain bit-interleaved coded modulation (BICM) in plain language, focusing on the practical mechanisms and properties rather than the deep analytical details? I found a few papers and articles on the subject, but they were all pitched at people who already have a deep familiarity with these types of schemes. My background is information security and I'm a self-taught electronics hobbyist, so my knowledge of the underpinning concepts is largely restricted to the practical.

To help reduce the scope of an answer, I'll explain my understanding so far.

I'm familiar with the high-level concept of block coding: you split your data into message blocks of some fixed size \$k\$, then you feed each message block into a coding function \$C(m)\$. The coding function is (as far as I know) deterministic, and has an inverse function \$C'(m)\$ for decoding, i.e. \$C'(C(m)) = m\$ for all \$m\$. From what I've read, it seems that the coding is generally referred to as a mapping into an alphabet \$\sum^k\$, where \$k\$ indicates the size of that alphabet, from which one could draw a parallel to invertible S-boxes in cryptography. Practically, \$\sum^k\$ could be a lookup table (LUT), although if \$k\$ is large it's probably necessary (due to practical constraints) to derive some linear algebraic function that describes \$\sum^k\$ instead, and perform those operations - this same issue comes up in cryptography.

However, I've also seen notation that uses the form \$(n,k,d)_q\$ to describe the properties of a block code, where the alphabet is \$\sum^q\$ and \$k\$ is the length of the codeword, which indicates to me that there must be block codes that cannot be expressed as a simple mapping over some alphabet \$\sum^k\$. My guess is that block codes where \$C(m)\$ can be expressed simply as mappings over an alphabet (i.e. \$k=q\$) are what I've seen referred to as linear block codes, whereas those which cannot are some other type of block code - perhaps something stateful, with feedback?

I understand that coding functions are designed such that the coded blocks (codewords) have various desirable properties, such as producing a nearly equal number of 1 and 0 bits in \$C(m)\$ for any \$m\$, to provide DC balance - an example would be 8b/10b coding. I also understand that error correction over noisy channels can be achieved via error correction schemes (e.g. parity bits) in the coding.

I know that the rate \$R\$ of a block code is the ratio between the size of the block and the size of the codeword, i.e. \$R=k/n\$. The lower the rate, the more coded data you must send per unit message size, making it less efficient. There is, then, a balance to be struck between the rate of the block code and the practical properties that can be achieved at a given rate. For example, we could refer to an 8-E-1 configuration on a serial port to be a block code \$m_{[0\dots7]} \to [0, m, m_0\oplus m_1 \oplus \dots \oplus m_7,1]\$ with a rate of \$(11,8)\$ or 0.7272, and say that it has one bit of error detection. Intuitively, the rate can't exceed 1.

I've seen the concept of minimum distance \$d\$ described, which is basically the minimum Hamming distance between any two outputs of \$C(m)\$, which you evaluate over all possible values of \$m\$ (and I guess also all states, if the scheme is stateful). I generally understand the relevance of this metric: the larger the minimum distance, the more bits you'd have to change in any one codeword to produce another valid codeword, so the more bit errors you can detect in a message. From this I presume that a general-case performance metric for any given block code might be \$R\times d\$, since you want to maximise both rate and distance in order to achieve the maximum error-free throughput on a given channel.

I very roughly grasp the concept of bounds in this context, at least in terms of them being useful metrics for comparing families of block codes, but I'm definitely not familiar with all of the analytical methods that are applied in this area. I suspect that this may be irrelevant for just understanding the actual practical mechanisms in use with BICM.

In terms of the physical layer, it seems that BICM, LDPC, etc. are often applied over QAM, where you have two carriers at 90° phase from each other and the amplitude of the modulated signal on each carrier can be represented as an axis on an X/Y plot. The in-phase carrier is called \$I\$ and the phase-shifted carrier is called \$Q\$, so it's called an IQ plot. The magnitudes of \$I\$ and \$Q\$ effectively form cartesian coordinates, although I suppose you could also use polar coordinates instead. If discrete points are chosen on an IQ plot, QAM can be used to transmit the coordinates of those points. If symbols are assigned to those points (e.g. 16 points with values 0 to 15) then the scheme can "point" to symbols. As I understand it, these are called constellations, and the scheme is generally referred to as QAM16 or QAM256 or whatever based on the number of defined points in that constellation.

The more symbols you cram into a constellation, the more information you can convey at a given symbol rate, but this necessitates a closer distance between points and therefore increases the likelihood of an error being conveyed. Apply a block coding scheme with error correction on top of this, and you can improve the error rate, albeit with the cost of the rate of the block code. The performance of the overall system then becomes largely reliant on balancing/optimising the Euclidean distance between points on the QAM constellation, the \$\frac {dV} {dt}\$ (and thus bandwidth and Nyquist rate) generated by the maximum change in \$I\$ and \$Q\$ magnitudes in a given symbol period, the Hamming distance between codewords, the error detection and correction capabilities of the block code, the rate of the block code, and other properties that affect electrical and RF performance on the physical layer.

I've seen it described that QAM schemes are beneficial because they allow for the effective information rate of the link to be multiplied by the number of points in the constellation, thus making for faster data transmission at a particular Nyquist frequency. I've also seen the term "capacity-approaching code" to refer to block codes that approach the Shannon limit, i.e. the theoretical maximum rate at which information can be transported over a link. The concept seems simple enough but I'm not sure how this is achieved in practice.

I've also read about parity-check matrices, which I guess are just convenient ways of modelling the parity rules of some given block code? So if we consider binary data as a polynomial \$\sum 2^i x_i\$ for \$x:[0,1]\$, then each row in a parity-check matrix defines which terms must sum (mod 2) to zero, i.e. have even parity. Or, programmatically:

foreach (row in matrix)
{
  if (hamming_weight(value & row) != 0)
    return false;
}
return true;

The example on the Wikipedia page seems reasonably straight forward. I presume that these come in handy if you're trying to build some kind of combinatorial logic in an FPGA or ASIC that needs to quickly distinguish between valid and invalid symbols, e.g. via a LUT.

At this point we start to reach the limits of my understanding. I've seen reference to turbo codes, LDPC, convolutional codes, recursive codes, iterative decoding, Tanner graphs, belief propagation, non-uniform constellations (NUCs), factor graphs, maximum likelihood decoding, decoder complexity, Trellis modulation, and Rayleigh fading channels. I'm quickly getting lost in the minutiae and am struggling to attain a big-picture understanding. The most readable paper on BICM that I could find was "Bit-interleaved coded modulation" (2008) by Martinez, Caire, but this quickly shot into the realm of formal hieroglyphics and left me behind.

Given this foundation of understanding, can someone explain how BICM works (and some key points about what makes it useful) without going off the deep end with unfamiliar notation and terms?

Your understanding in of \$(n,k,d)_q\$ is wrong. The code word length is \$n\$, the infoword length is \$k\$, logically \$k — Marcus Müller, Feb 28 '21 at 21:02
algebraic reasons I can't go into here. Don't confuse \$n\$ and \$q\$! The former is your code word length, the latter is the alphabet's size, from which you form the words. — Marcus Müller, Feb 28 '21 at 21:03
"In terms of the physical layer, it seems that BICM, LDPC, etc. are often applied over QAM," The code doesn't know anything about the physical layer modulation. It is a mapping from data words to (longer) data words. The decoder is usually also just a mapping of data words to data words (but the input might be "soft" data words). — Marcus Müller, Feb 28 '21 at 21:04
Your \$\frac{\mathrm dV}{\mathrm dt}\$ look is "right in spirit, sadly wrong in formula": The bandwidth of a transmission is defined by pulse shape, which of course must be wide enough to "fit" the desired symbols per second. — Marcus Müller, Feb 28 '21 at 21:08
"The concept seems simple enough but I'm not sure how this is achieved in practice." Seriously, this is multiple years of studying stuff to get a good idea of how things are achievable in practice :) You're doing well, but you have "too much on your plate" to learn at once. BICMs are really hard to understand if one — Marcus Müller, Feb 28 '21 at 21:10
The n/q/k part is much more clear now, thanks. I think I just parsed the text I was reading on it incorrectly. It makes sense that we would defined q that way, especially since it allows us to define the block code in terms of the symbol values at the QAM constellation points, which are generally prime powers (usually 2^n). — Polynomial, Feb 28 '21 at 21:11
Exactly. Here's the problem: Modulation and receiving (detection) of modulated signals (the process of putting digital data to an analog waveform, and getting digital data, or at least pseudo-probabilities for digital data, back out of the analog waveform) is a big topic to understand. Error-correction coding and decoding is a big topic in itself; [BICM](https://ir.cwi.nl/pub/13528/13528A.pdf) says "OK, we've done our best in the past to not make our brains explode by considering these separately, but what if we took our knowledge and married these?" — Marcus Müller, Feb 28 '21 at 21:14
It's not really all that impossible to chew once one has the basics in both fields, but it's a mouthful to get there! Maybe to refresh your QAM etc knowledge: first four chapters of https://pysdr.org ; then chapter 10 of the same (which is **really** not an introduction to the topic, more of a "typical overview"). — Marcus Müller, Feb 28 '21 at 21:17
And you are of course right about the dV/dt thing - I was considering optimal curves between the IQ magnitudes at the symbol rate, such that the rate of change of IQ vector magnitude is directly proportional to the maximum "distance" travelled from one symbol to the next, but that's obviously not achievable in practice. — Polynomial, Feb 28 '21 at 21:20
Reduction of outer code gives greater FEC efficiency and the interleaving methods allow increasing the number of quantization levels and more coding ratios and far less modulation and coding combinations with 2^19 bit memory. No math and more readable here https://riunet.upv.es/bitstream/handle/10251/82084/IEEE-T-BC_ATSC30_BICM_vAuthor.pdf;jsessionid=AB269034294E1630EB36218F48780FD3?sequence=3 — Tony Stewart EE75, Feb 28 '21 at 21:39
@MarcusMüller I'm going through some of my older questions to check whether I've accepted answers on them all. If you want to put your comments together into a short answer I'd be happy to accept it. Thanks again for the assistance. — Polynomial, Sep 09 '22 at 10:30

How does BICM work?

0 Answers0