org.apache.poi.xwpf.extractor
Class XWPFWordExtractor

java.lang.Object
  extended by org.apache.poi.POITextExtractor
      extended by org.apache.poi.POIXMLTextExtractor
          extended by org.apache.poi.xwpf.extractor.XWPFWordExtractor
All Implemented Interfaces:
java.io.Closeable

public class XWPFWordExtractor
extends POIXMLTextExtractor

Helper class to extract text from an OOXML Word file


Field Summary
static XWPFRelation[] SUPPORTED_TYPES
           
 
Constructor Summary
XWPFWordExtractor(OPCPackage container)
           
XWPFWordExtractor(XWPFDocument document)
           
 
Method Summary
 void appendBodyElementText(java.lang.StringBuffer text, IBodyElement e)
           
 void appendParagraphText(java.lang.StringBuffer text, XWPFParagraph paragraph)
           
 java.lang.String getText()
          Retrieves all the text from the document.
static void main(java.lang.String[] args)
           
 void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
          Should we concatenate phonetic runs in extraction.
 void setFetchHyperlinks(boolean fetch)
          Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
 
Methods inherited from class org.apache.poi.POIXMLTextExtractor
checkMaxTextSize, close, getCoreProperties, getCustomProperties, getDocument, getExtendedProperties, getMetadataTextExtractor, getPackage
 
Methods inherited from class org.apache.poi.POITextExtractor
setFilesystem
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SUPPORTED_TYPES

public static final XWPFRelation[] SUPPORTED_TYPES
Constructor Detail

XWPFWordExtractor

public XWPFWordExtractor(OPCPackage container)
                  throws org.apache.xmlbeans.XmlException,
                         OpenXML4JException,
                         java.io.IOException
Throws:
org.apache.xmlbeans.XmlException
OpenXML4JException
java.io.IOException

XWPFWordExtractor

public XWPFWordExtractor(XWPFDocument document)
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

setFetchHyperlinks

public void setFetchHyperlinks(boolean fetch)
Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents


setConcatenatePhoneticRuns

public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Should we concatenate phonetic runs in extraction. Default is true

Parameters:
concatenatePhoneticRuns -

getText

public java.lang.String getText()
Description copied from class: POITextExtractor
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.

Specified by:
getText in class POITextExtractor
Returns:
All the text from the document

appendBodyElementText

public void appendBodyElementText(java.lang.StringBuffer text,
                                  IBodyElement e)

appendParagraphText

public void appendParagraphText(java.lang.StringBuffer text,
                                XWPFParagraph paragraph)


Copyright 2017 The Apache Software Foundation or its licensors, as applicable.