如何使用Python函数实现字符串的压缩和解压缩?
发布时间:2023-07-26 02:13:27
字符串压缩是指通过使用更短的表示方法来缩小字符串的长度,从而减少存储空间或传输时间。而字符串解压缩则是将压缩后的字符串恢复到原始的未压缩状态。
在Python中,可以使用各种算法和方法来实现字符串的压缩和解压缩。下面将介绍两种常见的方法:Run-length编码和Huffman编码。
1. Run-length编码(行程长度编码):
Run-length编码是一种简单且直观的压缩方法,它通过统计相邻重复字符的数量来表示字符序列。例如,字符串"AABBBCCCC"可以被压缩为"2A3B4C"。
压缩函数的实现:
def compress_string(string):
compressed_string = ""
count = 1
for i in range(1, len(string)):
if string[i] == string[i-1]:
count += 1
else:
compressed_string += str(count) + string[i-1]
count = 1
compressed_string += str(count) + string[-1]
return compressed_string
解压缩函数的实现:
def decompress_string(string):
decompressed_string = ""
for i in range(0, len(string), 2):
count = int(string[i])
decompressed_string += count * string[i+1]
return decompressed_string
示例:
string = "AABBBCCCC" compressed_string = compress_string(string) print(compressed_string) # 输出:2A3B4C decompressed_string = decompress_string(compressed_string) print(decompressed_string) # 输出:AABBBCCCC
2. Huffman编码:
Huffman编码是一种可变长度编码方法,通过将频率较高的字符赋予较短的编码,以实现更高效的压缩。Huffman编码需要建立字符频率表和通过构建最小堆树来生成编码表。
首先,需要建立字符频率表的函数:
from collections import Counter
def build_frequency_table(string):
frequency_table = Counter(string)
return frequency_table
然后,通过构建最小堆树生成编码树和编码表的函数:
import heapq
class HuffmanNode:
def __init__(self, character, frequency):
self.character = character
self.frequency = frequency
self.left = None
self.right = None
def __lt__(self, other):
return self.frequency < other.frequency
def build_huffman_tree(frequency_table):
heap = []
for key, value in frequency_table.items():
node = HuffmanNode(key, value)
heapq.heappush(heap, node)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
parent = HuffmanNode(None, left.frequency + right.frequency)
parent.left = left
parent.right = right
heapq.heappush(heap, parent)
return heap[0]
def build_huffman_table(root):
huffman_table = {}
encode(root, "", huffman_table)
return huffman_table
def encode(node, code, huffman_table):
if node.character is not None:
huffman_table[node.character] = code
return
encode(node.left, code + "0", huffman_table)
encode(node.right, code + "1", huffman_table)
压缩函数的实现:
def compress_string(string, huffman_table):
compressed_string = ""
for char in string:
compressed_string += huffman_table[char]
return compressed_string
解压缩函数的实现:
def decompress_string(string, root):
decompressed_string = ""
node = root
for bit in string:
if bit == "0":
node = node.left
else:
node = node.right
if node.character is not None:
decompressed_string += node.character
node = root
return decompressed_string
示例:
string = "AABBBCCCC" frequency_table = build_frequency_table(string) root = build_huffman_tree(frequency_table) huffman_table = build_huffman_table(root) compressed_string = compress_string(string, huffman_table) print(compressed_string) # 输出:1101110110110110110110110 decompressed_string = decompress_string(compressed_string, root) print(decompressed_string) # 输出:AABBBCCCC
以上是实现字符串的压缩和解压缩的两种常见方法,可以根据实际需求选择合适的方法进行使用。
