1200字范文 > python查找文字在图片中的位置_python实现简单图片文字识别翻译OCR

python查找文字在图片中的位置_python实现简单图片文字识别翻译OCR

时间：2024-06-06 02:39:06

场景描述

实现类似微信扫一扫中翻译功能，即拍照商品，一般为英语、韩语、日语等商品描述，进行文字识别后，进行在线翻译。

图片识别翻译前

图片识别翻译后

第一步引入所需要的库

from PIL import ImageFontfrom PIL import Imagefrom PIL import ImageDrawimport hashlibfrom urllib import parsefrom urllib import requestimport randomimport base64import json

第二步图片文字识别ocr(翻译)

文字识别(Optical Character Recognition,OCR)，简单讲就是识别出图片中包含的文字信息。由于这是个很深的一个领域(贫道修行尚欠)，有兴趣的可以关注下第三方框架openCV，在这里简单通过第三方接口有道智云来实现，其他如百度等也都有免费的接口提供。之所以选有道，主要考虑是一般仅支持一个外文翻译为中文，有道智云相对而言多种一起识别，其次直接就帮我翻译成中文了，比较简单直接上代码。

# 替换成您的应用IDappKey = "29df4hs2342"# 替换您的应用密钥appSecret = "9bPJj8Lh7933hlJHGOLJDSocTRh"# 参数部分f = open(r'd_4.png', 'rb') # 二进制方式打开图文件q = base64.b64encode(f.read()) # 读取文件内容，转换为base64编码q = q.decode('UTF-8', 'strict')f.close()# 源语言fromLan = "en"# 目标语言to = "zh-CHS"# 上传类型type = "1"# 随机数，自己随机生成，建议时间戳salt = random.randint(1, 65536)# 签名sign = appKey + q + str(salt) + appSecretm1 = hashlib.md5()m1.update(sign.encode("utf8"))sign = m1.hexdigest()data = {'appKey': appKey, 'q': q, 'from': fromLan, 'to': to, 'type': type, 'salt': str(salt), 'sign': sign}data = parse.urlencode(data).encode(encoding='UTF8')req = request.Request('/ocrtransapi', data)response = request.urlopen(req)res = response.read()res = json.loads(res, encoding='utf-8')resRegions = res['resRegions']# 输出识别内容for i in resRegions: print(i)

第三步根据定位替换图片文字

这一步主要涉及python的PIL库，这个库很强大，主要用于图片的各种处理，可以自行根据python版本进行安装，python2.X和python3.X会有稍微区别。

# 绘制图片def dw(boundingBox, linesCount, lineheight, tranContent): # 文本box起点x,y,宽，高 x, y, w, h = boundingBox.split(',') x = int(x) y = int(y) w = int(w) h = int(h) # 设置字体字号 word_size = int(lineheight) word_css = "msyh.ttf" font = ImageFont.truetype(word_css, word_size) # 绘制文字 W, H = font.getsize(tranContent) # 文字总长和高 if W > w and int(linesCount) > 1: word_len = len(tranContent) r = w / W limit = int(w / word_size) i = limit tranContent = list(tranContent) while i < word_len: tranContent.insert(i, '') i += limit + 1 tranContent = ''.join(tranContent) X = x + w Y = y + h # 绘制矩形 draw.rectangle((x, y, X, Y), 'yellowgreen', 'wheat') draw.text((x, y), tranContent, 'DimGrey', font=font)if __name__ == "__main__": im = Image.open('d_4.png') textAngle = res['textAngle'] imNew = im.rotate(float(textAngle)) draw = ImageDraw.Draw(imNew) for resRegion in resRegions: boundingBox = resRegion['boundingBox'] linesCount = resRegion['linesCount'] lineheight = resRegion['lineheight'] tranContent = resRegion['tranContent'] dw(boundingBox, linesCount, lineheight, tranContent) imNew = imNew.rotate(-float(textAngle)) del draw # im.save('test.png') imNew.show() imNew.close()