1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > POI pdf ppt word excel

POI pdf ppt word excel

时间:2020-07-10 06:08:45

相关推荐

POI pdf ppt word excel

原地址:/blog/225178

关键字: word, excel, powerpoint, pdf, pdfbox

OFFICE文档使用POI控件,PDF可以使用PDFBOX0.7.3控件,完全支持中文,用XPDF也行,不过感觉PDFBOX比较好,而且作者也在更新。水平有限,万望各位指正WORD:Java代码 importorg.apache.lucene.document.Document; importorg.apache.lucene.document.Field; importorg.apache.poi.hwpf.extractor.WordExtractor; importjava.io.File; importjava.io.InputStream; importjava.io.FileInputStream; importcom.search.code.Index; publicDocumentgetDocument(Indexindex,Stringurl,Stringtitle,InputStreamis)throwsDocCenterException{ StringbodyText=null; try{ WordExtractorex=newWordExtractor(is);//is是WORD文件的InputStream bodyText=ex.getText(); if(!bodyText.equals("")){ index.AddIndex(url,title,bodyText); } }catch(DocCenterExceptione){ thrownewDocCenterException("无法从该MocriosoftWord文档中提取内容",e); }catch(Exceptione){ e.printStackTrace(); } } returnnull; }

Excel:

Java代码 importorg.apache.lucene.document.Document; importorg.apache.lucene.document.Field; importorg.apache.poi.hwpf.extractor.WordExtractor; importorg.apache.poi.hssf.usermodel.HSSFWorkbook; importorg.apache.poi.hssf.usermodel.HSSFSheet; importorg.apache.poi.hssf.usermodel.HSSFRow; importorg.apache.poi.hssf.usermodel.HSSFCell; importjava.io.File; importjava.io.InputStream; importjava.io.FileInputStream; importcom.search.code.Index; publicDocumentgetDocument(Indexindex,Stringurl,Stringtitle,InputStreamis)throwsDocCenterException{ StringBuffercontent=newStringBuffer(); try{ HSSFWorkbookworkbook=newHSSFWorkbook(is);//创建对Excel工作簿文件的引用 for(intnumSheets=0;numSheets<workbook.getNumberOfSheets();numSheets++){ if(null!=workbook.getSheetAt(numSheets)){ HSSFSheetaSheet=workbook.getSheetAt(numSheets);//获得一个sheet for(introwNumOfSheet=0;rowNumOfSheet<=aSheet.getLastRowNum();rowNumOfSheet++){ if(null!=aSheet.getRow(rowNumOfSheet)){ HSSFRowaRow=aSheet.getRow(rowNumOfSheet);//获得一个行 for(shortcellNumOfRow=0;cellNumOfRow<=aRow.getLastCellNum();cellNumOfRow++){ if(null!=aRow.getCell(cellNumOfRow)){ HSSFCellaCell=aRow.getCell(cellNumOfRow);//获得列值 content.append(aCell.getStringCellValue()); } } } } } } if(!content.equals("")){ index.AddIndex(url,title,content.toString()); } }catch(DocCenterExceptione){ thrownewDocCenterException("无法从该MocriosoftWord文档中提取内容",e); }catch(Exceptione){ System.out.println("已运行xlRead():"+e); } returnnull; }

PowerPoint:

Java代码 importjava.io.InputStream; importorg.apache.lucene.document.Document; importorg.apache.poi.hslf.HSLFSlideShow; importorg.apache.poi.hslf.model.TextRun; importorg.apache.poi.hslf.model.Slide; importorg.apache.poi.hslf.usermodel.SlideShow; publicDocumentgetDocument(Indexindex,Stringurl,Stringtitle,InputStreamis) throwsDocCenterException{ StringBuffercontent=newStringBuffer(""); try{ SlideShowss=newSlideShow(newHSLFSlideShow(is));//is为文件的InputStream,建立SlideShow Slide[]slides=ss.getSlides();//获得每一张幻灯片 for(inti=0;iTextRun[]t=slides[i].getTextRuns();//为了取得幻灯片的文字内容,建立TextRun for(intj=0;jcontent.append(t[j].getText());//这里会将文字内容加到content中去 } content.append(slides[i].getTitle()); } index.AddIndex(url,title,content.toString()); }catch(Exceptionex){ System.out.println(ex.toString()); } returnnull; }

PDF:

Java代码 importjava.io.InputStream; importjava.io.IOException; importorg.apache.lucene.document.Document; importorg.pdfbox.cos.COSDocument; importorg.pdfbox.pdfparser.PDFParser; importorg.pdfbox.pdmodel.PDDocument; importorg.pdfbox.pdmodel.PDDocumentInformation; importorg.pdfbox.util.PDFTextStripper; importcom.search.code.Index; publicDocumentgetDocument(Indexindex,Stringurl,Stringtitle,InputStreamis)throwsDocCenterException{ COSDocumentcosDoc=null; try{ cosDoc=parseDocument(is); }catch(IOExceptione){ closeCOSDocument(cosDoc); thrownewDocCenterException("无法处理该PDF文档",e); } if(cosDoc.isEncrypted()){ if(cosDoc!=null) closeCOSDocument(cosDoc); thrownewDocCenterException("该PDF文档是加密文档,无法处理"); } StringdocText=null; try{ PDFTextStripperstripper=newPDFTextStripper(); docText=stripper.getText(newPDDocument(cosDoc)); }catch(IOExceptione){ closeCOSDocument(cosDoc); thrownewDocCenterException("无法处理该PDF文档",e); } PDDocumentpdDoc=null; try{ pdDoc=newPDDocument(cosDoc); PDDocumentInformationdocInfo=pdDoc.getDocumentInformation(); if(docInfo.getTitle()!=null&&!docInfo.getTitle().equals("")){ title=docInfo.getTitle(); } }catch(Exceptione){ closeCOSDocument(cosDoc); closePDDocument(pdDoc); System.err.println("无法取得该PDF文档的元数据"+e.getMessage()); }finally{ closeCOSDocument(cosDoc); closePDDocument(pdDoc); } returnnull; } privatestaticCOSDocumentparseDocument(InputStreamis)throwsIOException{ PDFParserparser=newPDFParser(is); parser.parse(); returnparser.getDocument(); } privatevoidcloseCOSDocument(COSDocumentcosDoc){ if(cosDoc!=null){ try{ cosDoc.close(); }catch(IOExceptione){ } } } privatevoidclosePDDocument(PDDocumentpdDoc){ if(pdDoc!=null){ try{ pdDoc.close(); }catch(IOExceptione){ } } }

代码复制可能出错,不过代码经过测试,绝对能用,POI为3.0-rc4,PDFBOX为0.7.3

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。