Python并发编程及其相关函数库的应用实例

发布时间：2023-06-03 11:21:29

Python并发编程是指在同一时间内处理多个任务的能力，这种能力可以加速程序处理速度，缩短等待时间，提高性能和效率。常见的并发编程方式包括多线程和多进程。

多进程是通过创建多个进程来实现并发的，每个进程都拥有独立的内存空间和资源，相互之间不会干扰。Python中的multiprocessing库可以很轻松地实现多进程并发编程。

下面是一个使用multiprocessing库实现并发下载文件的示例代码：

import requests
from multiprocessing import Pool

def download_file(url):
    response = requests.get(url)
    file_name = url.split("/")[-1]
    with open(file_name, "wb") as f:
        f.write(response.content)
    print(f"{file_name} downloaded.")

if __name__ == "__main__":
    urls = ["http://www.domain.com/file1.txt",
            "http://www.domain.com/file2.txt",
            "http://www.domain.com/file3.txt",
            "http://www.domain.com/file4.txt"]
    with Pool(processes=4) as pool:
        pool.map(download_file, urls)

上述代码中，我们定义了一个download_file函数，该函数用来下载指定URL对应的文件，并保存为本地文件。然后我们使用multiprocessing库的Pool模块创建了4个进程并行执行download_file函数，实现了高效的文件下载。

除了多进程，还可以使用Python的多线程库——threading实现并发编程。下面是一个实现多线程爬虫的示例代码：

import threading
import requests
from queue import Queue

class WebCrawler:
    def __init__(self, num_threads):
        self.num_threads = num_threads
        self.queue = Queue()
        self.visited_urls = set()
        self.lock = threading.Lock()

    def add_url(self, url):
        with self.lock:
            self.queue.put(url)

    def get_url(self):
        with self.lock:
            if not self.queue.empty():
                return self.queue.get()
            else:
                return None

    def process_url(self):
        while True:
            url = self.get_url()
            if url is None:
                break
            if url not in self.visited_urls:
                self.visited_urls.add(url)
                response = requests.get(url)
                print(f"Fetched {url}.")
                for link in self.extract_links(response.text):
                    self.add_url(link)

    def start(self):
        threads = []
        for i in range(self.num_threads):
            t = threading.Thread(target=self.process_url)
            t.start()
            threads.append(t)
        for t in threads:
            t.join()

    def extract_links(self, html):
        # extract links from html
        pass

if __name__ == "__main__":
    crawler = WebCrawler(num_threads=10)
    crawler.add_url("http://www.domain.com")
    crawler.start()

上面的示例代码中，我们定义了一个WebCrawler类，该类实现了一个简单的爬虫程序，可以从一个起始URL开始爬取网页，并从中提取出链接继续爬取。我们使用线程池来实现并发处理，通过add_url和get_url函数来添加和获取URL链接，保证了多个线程操作queue的安全性和互斥性。最终，我们启动10个线程来并发执行爬虫任务，实现高效的爬虫操作。

总之，Python提供了丰富的并发编程库和函数，可以用来实现各种高效的并发编程任务和应用。我们可以根据具体问题的需求和特点，选择合适的并发编程方式和相关函数库，来构建高效、可靠的并发编程模型。