org.apache.poi.xwpf.extractor
Class XWPFWordExtractor
java.lang.Object
org.apache.poi.POITextExtractor
org.apache.poi.POIXMLTextExtractor
org.apache.poi.xwpf.extractor.XWPFWordExtractor
- All Implemented Interfaces:
- java.io.Closeable
public class XWPFWordExtractor
- extends POIXMLTextExtractor
Helper class to extract text from an OOXML Word file
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SUPPORTED_TYPES
public static final XWPFRelation[] SUPPORTED_TYPES
XWPFWordExtractor
public XWPFWordExtractor(OPCPackage container)
throws org.apache.xmlbeans.XmlException,
OpenXML4JException,
java.io.IOException
- Throws:
org.apache.xmlbeans.XmlException
OpenXML4JException
java.io.IOException
XWPFWordExtractor
public XWPFWordExtractor(XWPFDocument document)
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception
setFetchHyperlinks
public void setFetchHyperlinks(boolean fetch)
- Should we also fetch the hyperlinks, when fetching
the text content? Default is to only output the
hyperlink label, and not the contents
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
- Should we concatenate phonetic runs in extraction. Default is
true
- Parameters:
concatenatePhoneticRuns
-
getText
public java.lang.String getText()
- Description copied from class:
POITextExtractor
- Retrieves all the text from the document.
How cells, paragraphs etc are separated in the text
is implementation specific - see the javadocs for
a specific project for details.
- Specified by:
getText
in class POITextExtractor
- Returns:
- All the text from the document
appendBodyElementText
public void appendBodyElementText(java.lang.StringBuffer text,
IBodyElement e)
appendParagraphText
public void appendParagraphText(java.lang.StringBuffer text,
XWPFParagraph paragraph)
Copyright 2017 The Apache Software Foundation or
its licensors, as applicable.