1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > 图像文字识别(二):java调用tesseract 识别图片文字

图像文字识别(二):java调用tesseract 识别图片文字

时间:2018-04-22 06:59:18

相关推荐

图像文字识别(二):java调用tesseract 识别图片文字

在JAVA中调用tesseract识别图片的文字内容,主要有两种方式:cmd方式,tess4j方式。在这篇博客中,主要记录一下通过cmd命令行的方式。cmd方式,就是通过在java中调用命令行,来执行tesseract,它的原理就是上篇博客所写的内容。

步骤:

(1)导入两个jar包:jai_imageio-1.1.1.jar 和 swingx-1.6.1.jar

(2)编写ImageIOHelper类,用于创建临时图片文件,防止损坏初始文件

import java.awt.image.BufferedImage;import java.io.File;import java.io.IOException;import java.util.Iterator;import java.util.Locale;import javax.imageio.IIOImage;import javax.imageio.ImageIO;import javax.imageio.ImageReader;import javax.imageio.ImageWriteParam;import javax.imageio.ImageWriter;import javax.imageio.metadata.IIOMetadata;import javax.imageio.stream.ImageInputStream;import javax.imageio.stream.ImageOutputStream;import com.sun.media.imageio.plugins.tiff.TIFFImageWriteParam;/** * 类说明 :创建临时图片文件防止损坏初始文件*/public class ImageIOHelper {//设置语言private Locale locale=Locale.CHINESE;//自定义语言构造的方法public ImageIOHelper(Locale locale){this.locale=locale;}//默认构造器Locale.CHINESEpublic ImageIOHelper(){}/*** 创建临时图片文件防止损坏初始文件* @param imageFile* @param imageFormat like png,jps .etc* @return TempFile of Image*/public File createImage(File imageFile, String imageFormat) throws IOException {//读取图片文件Iterator<ImageReader> readers = ImageIO.getImageReadersByFormatName(imageFormat); ImageReader reader = readers.next();//获取文件流ImageInputStream iis = ImageIO.createImageInputStream(imageFile);reader.setInput(iis);IIOMetadata streamMetadata = reader.getStreamMetadata(); //设置writeParamTIFFImageWriteParam tiffWriteParam = new TIFFImageWriteParam(Locale.CHINESE); tiffWriteParam.setCompressionMode(ImageWriteParam.MODE_DISABLED); //设置可否压缩 //获得tiffWriter和设置outputIterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName("tiff"); ImageWriter writer = writers.next(); BufferedImage bi = reader.read(0); IIOImage image = new IIOImage(bi,null,reader.getImageMetadata(0)); File tempFile = tempImageFile(imageFile); ImageOutputStream ios = ImageIO.createImageOutputStream(tempFile); writer.setOutput(ios); writer.write(streamMetadata, image, tiffWriteParam); ios.close();iis.close();writer.dispose(); reader.dispose(); return tempFile; } /*** 给tempfile添加后缀* @param imageFile* @throws IOException */private File tempImageFile(File imageFile) throws IOException { String path = imageFile.getPath(); StringBuffer strB = new StringBuffer(path); strB.insert(path.lastIndexOf('.'),"_text_recognize_temp");String s=strB.toString().replaceFirst("(?<=//.)(//w+)$", "tif");Runtime.getRuntime().exec("attrib "+"\""+s+"\""+" +H"); //设置文件隐藏return new File(strB.toString()); } }

(3)创建OCRUtil工具类,用于进行图片文字识别:

import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream;import java.io.IOException;import java.io.InputStreamReader; import java.util.ArrayList; import java.util.List;import java.util.Locale;import org.jdesktop.swingx.util.OS;/** * 类说明:OCR工具类*/public class OCRUtil {private final String LANG_OPTION = "-l"; //英文字母小写l,并非阿拉伯数字1 private final String EOL = System.getProperty("line.separator"); private String tessPath = "D://Tesseract//Tsseract-OCR//Tesseract-OCR";//ocr的安装路径public OCRUtil(String tessPath,String transFileName){this.tessPath=tessPath;}//OCRUtil的构造方法,默认路径是"C://Program Files (x86)//Tesseract-OCR"public OCRUtil(){}public String getTessPath() {return tessPath;}public void setTessPath(String tessPath) {this.tessPath = tessPath;}public String getLANG_OPTION() {return LANG_OPTION;}public String getEOL() {return EOL;}/*** @param 需要识别的文件* @param 文件的格式* @return 识别后的文字*/public String recognizeText(File imageFile,String imageFormat)throws Exception{ File tempImage = new ImageIOHelper().createImage(imageFile,imageFormat); return ocrImages(tempImage, imageFile); } //可以自定义语言public String recognizeText(File imageFile,String imageFormat,Locale locale)throws Exception{ File tempImage = new ImageIOHelper(locale).createImage(imageFile,imageFormat);return ocrImages(tempImage, imageFile);}/*** @param 临时文件* @param 需要识别的文件* @return 识别后的内容* @throws IOException* @throws InterruptedException*/private String ocrImages(File tempImage,File imageFile) throws IOException, InterruptedException{//设置输出文件的保存的文件目录,以及文件名File outputFile = new File(imageFile.getParentFile(),"test");StringBuffer strB = new StringBuffer(); //设置命令行内容List<String> cmd = new ArrayList<String>(); if(OS.isWindowsXP()){ cmd.add(tessPath+"//tesseract"); }else if(OS.isLinux()){ cmd.add("tesseract"); }else{ cmd.add(tessPath+"//tesseract"); } cmd.add(""); cmd.add(outputFile.getName()); cmd.add(LANG_OPTION); cmd.add("chi_sim");//中文包cmd.add("equ");//常用数学公式包cmd.add("eng");//英语包//创建操作系统进程ProcessBuilder pb = new ProcessBuilder(); pb.directory(imageFile.getParentFile());//设置此进程生成器的工作目录 cmd.set(1, tempImage.getName()); mand(cmd);//设置要执行的cmd命令 pb.redirectErrorStream(true);//设置后续子进程生成的错误输出都将与标准输出合并 long startTime = System.currentTimeMillis();System.out.println("开始时间:" + startTime);Process process = pb.start();//开始执行,并返回进程实例 //最终执行命令为:tesseract 1.png test -l chi_sim+equ+engint w = process.waitFor(); tempImage.delete();//删除临时正在工作文件 if(w==0){ // 0代表正常退出BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(outputFile.getAbsolutePath()+".txt"),"UTF-8")); String str; while((str = in.readLine())!=null){ strB.append(str).append(EOL); } in.close(); long endTime = System.currentTimeMillis();System.out.println("结束时间:" + endTime);System.out.println("耗时:" + (endTime - startTime) + "毫秒");}else{ String msg; switch(w){ case 1: msg = "Errors accessing files.There may be spaces in your image's filename."; break; case 29: msg = "Cannot recongnize the image or its selected region."; break; case 31: msg = "Unsupported image format."; break; default: msg = "Errors occurred."; } tempImage.delete(); throw new RuntimeException(msg); } new File(outputFile.getAbsolutePath()+".txt");//.delete(); return strB.toString().replaceAll("\\s*", ""); }}

(4)创建测试类Test:

import java.io.File;import java.io.IOException;/** * @version 创建时间:4月25日 下午5:09:19 * 类说明:测试类*/public class Test {public static void main(String[] args) {try {//图片文件:此图片是需要被识别的图片路径 File file = new File("C://Users//1_0208150251_x4hzz//1.png");//String recognizeText = new OCRHelper().recognizeText(file);String recognizeText = new OCRUtil().recognizeText(file, "png");System.out.print(recognizeText + "\t");} catch (IOException e) {e.printStackTrace();} catch (Exception e) {e.printStackTrace();}}}

至此,只要传入需要识别的图片,就可以识别出图片中的文字的内容了。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。