How do I find average bits per symbol using huffman code?

Question

I'm trying to write a program in c for Huffman coding, but I am stuck. For input I have:

Sample input:
4      // here I scan how many letters I have
A 00   // and for everyone I scan how they are coded in string down
B 10     
C 01
D 11
001010010101001011010101010110011000 //this is a suboptimal huffman code

So first I have to decode this string, and to find out how many times every letter appear. And I already do that. But now I have to find out how many bits have every letter using huffman tree, and in the output I have to print the average bit per symbol.

The output for this example here have to be:

Sample output
1.722

So now, how to find out how many bits have every letter with huffman coding?

In your example, since every letter uses two bits, I don't see why the average isn't 2.00 ABBCCCABDCCCCCBCBA — , May 01 '14 at 01:22

score 6 · Answer 1 · answered May 01 '14 at 09:24

To solve this you need to create the huffman tree and compute the bits needed to represent every symbol. Then you can compute total bits needed for original string in huffman encoding and divide by number of characters.

First you map your input string based on the original character encoding :

00 A
10 B
10 B
01 C
01 C
01 C
00 A
10 B
11 D
01 C
01 C
01 C
01 C
01 C
10 B
01 C
10 B
00 A

Next you count number of occurrence of each character:

3 00,A
9 01,C
5 10,B
1 11,D

Now we make a min priority queue using the occurrence as key, this looks like :

[(1,D), (3,A), (5, B), (9,C)]

Keep applying the huffman process ( http://en.wikipedia.org/wiki/Huffman_coding ). So first you combine D and A to make a new node 'DA' which key = 1+3 = 4. Put this back in the priority queue:

[(4, DA), (5, B), (9,C)]

Now DA and B combine to give DAB:

[(9, DAB), (9,C)]

Now DAB and C combine to give root node : 'DABC'

[(18, DABC)]

Now the process stops and we give each character a new encoding based on how far it is away from the root node. 'C' was combined the last so that get's only one bit. Let's say I always use '0' for the second element ( of the two that got picked from priority queue). The implicit bits are represented in parenthesis:

C =      0, DAB =      1
B = (1)  0, DA  = (1)  1
A = (11) 0, D   = (11) 1

So you get the encoding:

C = 0
B = 10
A = 110
D = 111

Encoding original message:

Total bits needed = 9 * 1 + 5 * 2 + 3 * 3 + 3 * 1 
= 9 + 10 + 9 + 3 
= 31

Number of Characters = 18

Average bits = 31 / 18 = 1.722222

thanks this was realy helpful, but i don't know how to make this:Now we make a min priority queue using the occurrence as key, this looks like : [(1,D), (3,A), (5, B), (9,C)] in C programming language — Maria, May 01 '14 at 17:48
I have explained this as a concept, to do it in 'C' language requires coding up priority queue data structure, this is usually coded as a heap (http://en.wikipedia.org/wiki/Priority_queue). In C++, if you use STL, you can use the priority_queue data structure to achieve this. — user3585718, May 06 '14 at 16:42
While it is a good example, it is impractical in the real world, where you might have a probability distribution, instead of some concrete data on which you can count the occurrences. — Błażej Michalik, Jan 17 '19 at 02:07

score 0 · Answer 2 · answered May 01 '14 at 01:17

Once you have the Huffman coding tree, the optimum code for each symbol is given by the path to the symbol in the tree.

For instance, let's take this tree and say that left is 0 and right is 1 (this is arbitrary) :

/ \
A  \
   /\
  B  \
     /\
     C D

Path to A is left, therefore its optimum code is 0, the length of this code is 1 bit. Path to B is right, left, its code is 10, length 2 bits. C is right, right, left, code 110 ,3 bits, and D right, right, right, right, code 1111, 4 bits.

Now you have the length of each code and you already computed the frequency of each symbol. The average bits per symbol is the average across these code lengths weighted by the frequency of their associated symbols.

How do I find average bits per symbol using huffman code?

2 Answers2

Related