1200字范文 > Python——Python使用POP3协议客户端poplib登录邮箱并解压缩zip rar压缩包

Python——Python使用POP3协议客户端poplib登录邮箱并解压缩zip rar压缩包

时间：2023-05-15 08:29:37

文章目录

1. 项目背景2. poplib模块3. 邮箱登录4. 获取邮件内容1. 获取邮件基本信息2. 获取邮件中的附件5. 解压zip/rar压缩包1. 打开zip/rar压缩包2. 获取压缩包中文件

1. 项目背景

目前在做的一个小项目，需要到登录到邮箱获取压缩包，解压压缩包获取文件，并从文件中抽取出有效数据入库，后续会做一些二次加工，最后用到业务风控中。

2. poplib模块

Python内建了poplib模块来实现登录邮箱，这个模块提供了一个类POP3_SSL，它支持使用SSL(Secure Socket Layer，安全套阶层，在OSI模型处于会话层)作为底层协议连接POP3服务器。

3. 邮箱登录

邮箱登录代码实现：

import poplib# 通过主机名、端口生成POP3对象pop = poplib.POP3_SSL(host="", port="")try:# 通过用户名、密码登录邮箱pop.user("")pop.pass_("")except poplib.error_proto as e:logger.error("Login failed: " + e)else:parse(pop, sys.argv)finally:# 退出邮箱pop.quit()

注意，最后程序不管是正常结束还是非正常结束都一定要退出邮箱。

4. 获取邮件内容

获取指定邮件有多种方式，可以通过邮件发件人、邮件发件邮箱地址、邮件主题，还可以用邮件索引，使用邮件索引速度最快，这种获取方式就跟在散列表中根据key获取对应value很相似。

1. 获取邮件基本信息

登录邮箱之后，可以获取很多与邮件相关的基本信息，比如邮箱邮件列表、邮件主题、邮件内容、邮件发件人及发件地址、邮件接收时间等等

代码实现：

from email.parser import Parserdef get_mail_list(pop):# 查找附件, 并返回下载文件路径emails = pop.list()[1] # 邮箱邮件列表email_num = len(emails) # 邮件数量for i in [10, 20, 30]:lines = pop.retr(i)[1] # 获取指定的邮件line_bytes = b'\r\n'.join(lines) # 换行符分割邮件信息msg_content = line_bytes.decode('utf-8') # 解码msg = Parser().parsestr(msg_content) # 每封邮件信息# 发件人和邮箱地址header, address = parseaddr(msg.get("From", "")) name, address = decode_str(header), decode_str(address)# 邮件主题email_subject = decode_str(msg.get('Subject', ""))# 邮件接收时间date_tuple = parsedate_tz(msg.get("Date", ""))date_formatted = datetime.fromtimestamp(mktime_tz(date_tuple)).strftime("%Y-%m-%d %H:%M:%S")compressed_files = list()for compressed_file in get_attachment(msg, compressed_files):yield compressed_file

2. 获取邮件中的附件

获取邮件附件中的zip或rar压缩包，注意：这里要考虑到一封邮件中有多个压缩包的情况。

代码实现：

def get_attachment(msg, ompressed_files: list):if msg.is_multipart() is True:# 分层信息parts = msg.get_payload()for n, part in enumerate(parts):res = get_attachment(part, compressed_files)return reselse:content_type = msg.get_content_type()if content_type == 'application/octet-stream':for subpart in msg.walk(): # 遍历消息树(深度优先搜索算法), 获取每个子节点file_name_encoder = subpart.get_filename()file_name = decode_str(file_name_encoder) # 解码获取文件名# 判断是否是zip或rar格式文件if file_name.split(".")[-1] not in ['zip', "rar"]:continuedata = msg.get_payload(decode=True) # 附件二进制file_data = base64.b64encode(data).decode() # base64编码compressed_files.append({"file_data": file_data, "name": file_name}) # 保存到列表return compressed_filesdef decode_str(s):"""解码:param s::return:"""value, charset = decode_header(s)[0]if charset:value = value.decode(charset)return value

5. 解压zip/rar压缩包

在获取到邮件附件的二进制内容之后，就可以使用zipfile模块和rarfile模块解解压zip和rar压缩包了。

1. 打开zip/rar压缩包

处理步骤4中获取的邮件附件，代码如下：

import zipfileimport rarfilefrom io import BytesIOdef parse(pop):for data in get_mail_list(pop):try:if data['name'].endswith(".zip"): # zip格式的压缩文件zip_obj = zipfile.ZipFile(BytesIO(base64.b64decode(data['file_data'].encode())))for record in get_files_in_zip(zip_obj, item):pass # 数据处理elif data['name'].endswith(".rar"): # rar格式的压缩文件rar_obj = rarfile.RarFile(BytesIO(base64.b64decode(data['file_data'].encode())))for record in get_files_in_rar(rar_obj, item):pass # 数据处理except Exception as e:logger.error(traceback.format_exc())

2. 获取压缩包中文件

打开压缩包之后，便可迭代获取压缩包中文件。不过这里要注意的是，zip/rar压缩中可能会嵌套压缩包，比如zip压缩包中嵌套zip压缩包或rar压缩包，那么就需要使用递归进行解压缩。

代码实现：

import zipfileimport rarfilefrom io import BytesIOdef get_files_in_zip(zip_obj):# 获取压缩包中文件列表, 同时过滤一些空目录files = [file for file in zip_obj.namelist() if re.search("\\..*$", file)]# 遍历每个压缩包中的所有文件for file in files:# 压缩包中嵌套压缩包if file.endswith("zip"):inner_zip_obj = zipfile.ZipFile(BytesIO(base64.b64decode(base64.b64encode(zip_obj.read(file)).decode().encode())))yield from get_files_in_zip(inner_zip_obj)continueelif file.endswith("rar"):inner_rar_obj = rarfile.RarFile(BytesIO(base64.b64decode(base64.b64encode(zip_obj.read(file)).decode().encode())))yield from get_files_in_rar(inner_rar_obj)continuepass # 压缩中文件处理def get_files_in_rar(rar_obj):# 获取压缩包中文件列表, 同时过滤一些空目录files = [file for file in rar_obj.namelist() if re.search("\\..*$", file)]# 解析每个文件内容for file in files:# 压缩包中嵌套压缩包if file.endswith("rar"):inner_rar_obj = rarfile.RarFile(BytesIO(base64.b64decode(base64.b64encode(rar_obj.read(file)).decode().encode())))yield from get_files_in_rar(inner_rar_obj)continueelif file.endswith("zip"):inner_zip_obj = zipfile.ZipFile(BytesIO(base64.b64decode(base64.b64encode(rar_obj.read(file)).decode().encode())))yield from get_files_in_zip(inner_zip_obj)continuepass # 压缩中文件处理

解压获取压缩包中文件之后，就可以从文件(这里主要是pdf、html、excel、csv等格式的文件)中提取数据了。

不同格式的文件如何解析，请参考我的另一篇文章。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。