4

Can someone explain, in their own words, what Double Bit Error Detection is and how to derive it? An example of corrupted data and how to detect the double bit would be appreciated.

I can do Single Bit Error Correction using parity bits as well as correct the flipped bit. Now when I reach Double Bit Error Detection I understand there is an extra DED bit, which is somehow related to the even or odd parity of the bit sequence. However, I am lost.

What I read: http://en.wikipedia.org/wiki/Error_detection_and_correction

Video on Hamming Code: http://www.youtube.com/watch?v=JAMLuxdHH8o

Mike John
  • 157
  • 1
  • 2
  • 7
  • Do you understand Hamming distance http://en.wikipedia.org/wiki/Hamming_distance - it might be worth reading if you don't. Basically in error detection/correction algorithms you add "redundant" bits to your data so that data+redundancy has a hamming distance of at least 4 - this allows one error to make the D+R correctable AND two errors make D+R detectable. 3 errors means you think you can correct but erroneously correct it to a wrong number. Does this make any sense? – Andy aka Jun 02 '13 at 21:47
  • That much I get. However, proving, lets say that 2 out of 21 bits is flipped, is a skill I don't have. – Mike John Jun 02 '13 at 23:40
  • Here's a "simple" version of what Dave and Andy said: Each valid code word is arranged such that there are no other valid code word can be arrived at if ANY N bits in a valid word are flipped. If N=3 then you can flip one bit in any valid code word and not get to a combination that can be arrived at from any other word. If N=3 and you flip 2 bits at random you cannot reach another valid word (as it is at least 3 flips away) BUT two valid words may both be able to have 2 bits flipped to reach this intermediate state - so an N=3 code will allow you to correct 1 bit errors and detect 2 bit one. – Russell McMahon Jun 03 '13 at 02:26
  • If N = 5, so that you must flip ANY 5 bits to arrive at another valid word then 3 flips will get you to a state that MAY be able to be reached by flipping 3 bits in another valied word, but if you flip two bits you will get a result which can only have come from the word you started with, so you can correct 2 bt errors and detect 3 bit errors. The "corrector" can be as simple in this case as a lookup table which takes the input word and returns the only correct code word that could have caused it. Note that for N=5, if you have 4 bit errors the wprd will be "corrected" but wrongly. – Russell McMahon Jun 03 '13 at 02:30
  • 1
    Code 1 = 000. Code 2 = 111. 3 bits MUST be flipped to convert 000 to 111 or vice versa. So 100 010 001 can be corrected to 000. And 011 101 110 can be corrected to 111. BUT a two bit error that changes 000 to 011 will be wrongly "corrected" to 111. – Russell McMahon Jun 03 '13 at 02:33

1 Answers1

9

A Hamming code is a particular kind of error-correcting code (ECC) that allows single-bit errors in code words to be corrected. Such codes are used in data transmission or data storage systems in which it is not feasible to use retry mechanisms to recover the data when errors are detected. This type of error recovery is also known as forward error correction (FEC).

Constructing a Hamming code to protect, say, a 4-bit data word

Hamming codes are relatively easy to construct because they're based on parity logic. Each check bit is a parity bit for a particular subset of the data bits, and they're arranged so that the pattern of parity errors directly indicates the position of the bit error.

It takes three check bits to protect four data bits (the reason for this will become apparent shortly), giving a total of 7 bits in the encoded word. If you number the bit positions of an 8-bit word in binary, you see that there is one position that has no "1"s in its column, three positions that have a single "1" each, and four positions that have two or more "1"s.

If the four data bits are called A, B, C and D, and our three check bits are X, Y and Z, we place them in the columns such that the check bits are in the columns with one "1" and the data bits are in the columns with more than one "1". The bit in position 0 is not used.

Bit position:  7  6  5  4  3  2  1  0
   in binary:  1  1  1  1  0  0  0  0
               1  1  0  0  1  1  0  0
               1  0  1  0  1  0  1  0
         Bit:  A  B  C  X  D  Y  Z  -

The check bit X is set or cleared so that all of the bits with a "1" in the top row — A, B C and X — have even parity. Similarly, the check bit Y is the parity bit for all of the bits with a "1" in the second row (A, B and D), and the check bit Z is the parity bit for all of the bits with a "1" in the third row (A, C and D).

Now all seven bits — the codeword — are transmitted (or stored), usually reordered so that the data bits appear in their original sequence: A B C D X Y Z. When they're received (or retrieved) later, the data bits are put through the same encoding process as before, producing three new check bits X', Y' and Z'. If the new check bits are XOR'd with the received check bits, an interesting thing occurs. If there's no error in the received bits, the result of the XOR is all zeros. But if there's a single bit error in any of the seven received bits, the result of the XOR is a nonzero three-bit number called the "syndrome" that directly indicates the position of the bit error as defined in the table above. If the bit in this position is flipped, then the original 7-bit codeword is perfectly reconstructed.

A couple of examples will illustrate this. Let's assume that the data bits are all zero, which also means that all of the check bits are zero as well. If bit "B" is set in the received word, then the recomputed check bits X'Y'Z' (and the syndrome) will be 110, which is the bit position for B. If bit "Y" is set in the received word, then the recomputed check bits will be "000", and the syndrome will be "010", which is the bit position for Y.

Hamming codes get more efficient with larger codewords. Basically, you need enough check bits to enumerate all of the data bits plus the check bits plus one. Therefore, four check bits can protect up to 11 data bits, five check bits can protect up to 26 data bits, and so on. Eventually you get to the point where if you have 8 bytes of data (64 bits) with a parity bit on each byte, you have enough parity bits to do ECC on the 64 bits of data instead.

Different (but equivalent) Hamming codes

Given a specific number N of check bits, there are 2N equivalent Hamming codes that can be constructed by arbitrarily choosing each check bit to have either "even" or "odd" parity within its group of data bits. As long as the encoder and the decoder use the same definitions for the check bits, all of the properties of the Hamming code are preserved.

Sometimes it's useful to define the check bits so that an encoded word of all-zeros or all-ones is always detected as an error.

What happens when multiple bits get flipped in a Hamming codeword

Multible bit errors in a Hamming code cause trouble. Two bit errors will always be detected as an error, but the wrong bit will get flipped by the correction logic, resulting in gibberish. If there are more than two bits in error, the received codeword may appear to be a valid one (but different from the original), which means that the error may or may not be detected.

In any case, the error-correcting logic can't tell the difference between single bit errors and multiple bit errors, and so the corrected output can't be relied on.

Extending a Hamming code to detect double-bit errors

Any single-error correcting Hamming code can be extended to reliably detect double bit errors by adding one more parity bit over the entire encoded word. This type of code is called a SECDED (single-error correcting, double-error detecting) code. It can always distinguish a double bit error from a single bit error, and it detects more types of multiple bit errors than a bare Hamming code does.

It works like this: All valid code words are (a minimum of) Hamming distance 3 apart. The "Hamming distance" between two words is defined as the number of bits in corresponding positions that are different. Any single-bit error is distance one from a valid word, and the correction algorithm converts the received word to the nearest valid one.

If a double error occurs, the parity of the word is not affected, but the correction algorithm still corrects the received word, which is distance two from the original valid word, but distance one from some other valid (but wrong) word. It does this by flipping one bit, which may or may not be one of the erroneous bits. Now the word has either one or three bits flipped, and the original double error is now detected by the parity checker.

Note that this works even when the parity bit itself is involved in a single-bit or double-bit error. It isn't hard to work out all the combinations.

Dave Tweed
  • 168,369
  • 17
  • 228
  • 393
  • I don't agree with the last couple of paragraphs here. A typical implementation of a \$[2^m, 2^m-1-m]\$ Hamming SECDED code computes the \$(m+1)\$-bit syndrome, and _corrects_ the single error using \$m\$ syndrome bits if the \$(m+1)\$-th syndrome bit (overall parity bit) is 1 (or parity failure) indicating a single error has occurred, and simply indicates that a double error has occurred if the overall parity bit is a \$0\$. Put another way, all the codewords of the SECDED code have _even weight_ (even number of ones in them), and SEC is attempted only if the received word has odd weight. – Dilip Sarwate Jun 03 '13 at 03:03