Huffman Coding is a lossless data compression algorithm that assigns variable-length codes to input characters based on their frequencies.
The idea is to give shorter codes to characters that appear more frequently and longer codes to less frequent characters. This results in compressed data representation.
It is widely used in file compression formats like ZIP and JPEG.
Steps
Build a frequency table of characters in the input.
Build a priority queue (min-heap) using the frequency of each character.
Build a Huffman Tree: Combine the two nodes with the lowest frequency until there is only one node left. This node is the root of the Huffman tree.
Generate Huffman codes: Traverse the tree to assign binary codes to each character.
Encode the input data using the generated Huffman codes.
Decode the data by traversing the Huffman tree.
Time Complexity
Building the frequency table: O(n), where n is the number of characters.
Building the heap: O(n log n), where n is the number of unique characters.
Encoding: O(n), where n is the length of the input.
import heapqfrom collections import defaultdict# Step 1: Build frequency tabledef build_frequency_table(data): freq = defaultdict(int) for char in data: freq[char] += 1 return freq# Step 2: Build the Huffman Treedef build_huffman_tree(freq): heap = [[weight, [char, ""]] for char, weight in freq.items()] heapq.heapify(heap) while len(heap) > 1: lo = heapq.heappop(heap) hi = heapq.heappop(heap) for pair in lo[1:]: pair[1] = '0' + pair[1] for pair in hi[1:]: pair[1] = '1' + pair[1] heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:]) return sorted(heap[0][1:], key=lambda p: (len(p[-1]), p))# Step 3: Encode data using the Huffman Treedef encode(data, huffman_tree): huff_dict = {item[0]: item[1] for item in huffman_tree} return ''.join(huff_dict[char] for char in data)# Example Usagedata = "this is an example for huffman encoding"freq_table = build_frequency_table(data)huffman_tree = build_huffman_tree(freq_table)encoded_data = encode(data, huffman_tree)print(f"Encoded Data: {encoded_data}")