欢迎访问宙启技术站
智能推送

如何使用Python函数实现字符串的压缩和解压缩?

发布时间:2023-07-26 02:13:27

字符串压缩是指通过使用更短的表示方法来缩小字符串的长度,从而减少存储空间或传输时间。而字符串解压缩则是将压缩后的字符串恢复到原始的未压缩状态。

在Python中,可以使用各种算法和方法来实现字符串的压缩和解压缩。下面将介绍两种常见的方法:Run-length编码和Huffman编码。

1. Run-length编码(行程长度编码):

Run-length编码是一种简单且直观的压缩方法,它通过统计相邻重复字符的数量来表示字符序列。例如,字符串"AABBBCCCC"可以被压缩为"2A3B4C"。

压缩函数的实现:

def compress_string(string):
    compressed_string = ""
    count = 1
    for i in range(1, len(string)):
        if string[i] == string[i-1]:
            count += 1
        else:
            compressed_string += str(count) + string[i-1]
            count = 1
    compressed_string += str(count) + string[-1]
    return compressed_string

解压缩函数的实现:

def decompress_string(string):
    decompressed_string = ""
    for i in range(0, len(string), 2):
        count = int(string[i])
        decompressed_string += count * string[i+1]
    return decompressed_string

示例:

string = "AABBBCCCC"
compressed_string = compress_string(string)
print(compressed_string)
# 输出:2A3B4C

decompressed_string = decompress_string(compressed_string)
print(decompressed_string)
# 输出:AABBBCCCC

2. Huffman编码:

Huffman编码是一种可变长度编码方法,通过将频率较高的字符赋予较短的编码,以实现更高效的压缩。Huffman编码需要建立字符频率表和通过构建最小堆树来生成编码表。

首先,需要建立字符频率表的函数:

from collections import Counter

def build_frequency_table(string):
    frequency_table = Counter(string)
    return frequency_table

然后,通过构建最小堆树生成编码树和编码表的函数:

import heapq

class HuffmanNode:
    def __init__(self, character, frequency):
        self.character = character
        self.frequency = frequency
        self.left = None
        self.right = None
    
    def __lt__(self, other):
        return self.frequency < other.frequency

def build_huffman_tree(frequency_table):
    heap = []
    for key, value in frequency_table.items():
        node = HuffmanNode(key, value)
        heapq.heappush(heap, node)

    while len(heap) > 1:
        left = heapq.heappop(heap)
        right = heapq.heappop(heap)

        parent = HuffmanNode(None, left.frequency + right.frequency)
        parent.left = left
        parent.right = right

        heapq.heappush(heap, parent)

    return heap[0]

def build_huffman_table(root):
    huffman_table = {}
    encode(root, "", huffman_table)
    return huffman_table

def encode(node, code, huffman_table):
    if node.character is not None:
        huffman_table[node.character] = code
        return
    encode(node.left, code + "0", huffman_table)
    encode(node.right, code + "1", huffman_table)

压缩函数的实现:

def compress_string(string, huffman_table):
    compressed_string = ""
    for char in string:
        compressed_string += huffman_table[char]
    return compressed_string

解压缩函数的实现:

def decompress_string(string, root):
    decompressed_string = ""
    node = root
    for bit in string:
        if bit == "0":
            node = node.left
        else:
            node = node.right

        if node.character is not None:
            decompressed_string += node.character
            node = root

    return decompressed_string

示例:

string = "AABBBCCCC"
frequency_table = build_frequency_table(string)
root = build_huffman_tree(frequency_table)
huffman_table = build_huffman_table(root)

compressed_string = compress_string(string, huffman_table)
print(compressed_string)
# 输出:1101110110110110110110110

decompressed_string = decompress_string(compressed_string, root)
print(decompressed_string)
# 输出:AABBBCCCC

以上是实现字符串的压缩和解压缩的两种常见方法,可以根据实际需求选择合适的方法进行使用。