To solve this you need to create the huffman tree and compute the bits needed to represent every symbol. Then you can compute total bits needed for original string in huffman encoding and divide by number of characters.
First you map your input string based on the original character encoding :
00 A
10 B
10 B
01 C
01 C
01 C
00 A
10 B
11 D
01 C
01 C
01 C
01 C
01 C
10 B
01 C
10 B
00 A
Next you count number of occurrence of each character:
3 00,A
9 01,C
5 10,B
1 11,D
Now we make a min priority queue using the occurrence as key, this looks like :
[(1,D), (3,A), (5, B), (9,C)]
Keep applying the huffman process ( http://en.wikipedia.org/wiki/Huffman_coding ). So first you combine D and A to make a new node 'DA' which key = 1+3 = 4. Put this back in the priority queue:
[(4, DA), (5, B), (9,C)]
Now DA and B combine to give DAB:
[(9, DAB), (9,C)]
Now DAB and C combine to give root node : 'DABC'
[(18, DABC)]
Now the process stops and we give each character a new encoding based on how far it is away from the root node. 'C' was combined the last so that get's only one bit. Let's say I always use '0' for the second element ( of the two that got picked from priority queue). The implicit bits are represented in parenthesis:
C = 0, DAB = 1
B = (1) 0, DA = (1) 1
A = (11) 0, D = (11) 1
So you get the encoding:
C = 0
B = 10
A = 110
D = 111
Encoding original message:
Total bits needed = 9 * 1 + 5 * 2 + 3 * 3 + 3 * 1
= 9 + 10 + 9 + 3
= 31
Number of Characters = 18
Average bits = 31 / 18 = 1.722222