XWPFWordExtractor (POI API Documentation)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.poi.xwpf.extractor
Class XWPFWordExtractor

java.lang.Object
  org.apache.poi.POITextExtractor
      org.apache.poi.POIXMLTextExtractor
          org.apache.poi.xwpf.extractor.XWPFWordExtractor

All Implemented Interfaces:: java.io.Closeable

public class XWPFWordExtractor
extends POIXMLTextExtractor
extends POIXMLTextExtractor

Helper class to extract text from an OOXML Word file

Field Summary
`static XWPFRelation[]`	`SUPPORTED_TYPES`

Constructor Summary
`XWPFWordExtractor(OPCPackage container)`
`XWPFWordExtractor(XWPFDocument document)`

Method Summary
`void`	`appendBodyElementText(java.lang.StringBuffer text, IBodyElement e)`
`void`	`appendParagraphText(java.lang.StringBuffer text, XWPFParagraph paragraph)`
`java.lang.String`	`getText()` Retrieves all the text from the document.
`static void`	`main(java.lang.String[] args)`
`void`	`setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)` Should we concatenate phonetic runs in extraction.
`void`	`setFetchHyperlinks(boolean fetch)` Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents

Methods inherited from class org.apache.poi.POIXMLTextExtractor
`checkMaxTextSize, close, getCoreProperties, getCustomProperties, getDocument, getExtendedProperties, getMetadataTextExtractor, getPackage`

Methods inherited from class org.apache.poi.POITextExtractor
`setFilesystem`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

SUPPORTED_TYPES

public static final XWPFRelation[] SUPPORTED_TYPES

Constructor Detail

XWPFWordExtractor

public XWPFWordExtractor(OPCPackage container)
                  throws org.apache.xmlbeans.XmlException,
                         OpenXML4JException,
                         java.io.IOException

Throws:: org.apache.xmlbeans.XmlException; OpenXML4JException; java.io.IOException

XWPFWordExtractor

public XWPFWordExtractor(XWPFDocument document)

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception

Throws:: java.lang.Exception

setFetchHyperlinks

public void setFetchHyperlinks(boolean fetch)

Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents

setConcatenatePhoneticRuns

public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)

Should we concatenate phonetic runs in extraction. Default is true

Parameters:: concatenatePhoneticRuns -

getText

public java.lang.String getText()

Description copied from class: POITextExtractor

Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.

Specified by:: getText in class POITextExtractor

Returns:: All the text from the document

appendBodyElementText

public void appendBodyElementText(java.lang.StringBuffer text,
                                  IBodyElement e)

appendParagraphText

public void appendParagraphText(java.lang.StringBuffer text,
                                XWPFParagraph paragraph)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.poi.xwpf.extractor Class XWPFWordExtractor

SUPPORTED_TYPES

XWPFWordExtractor

XWPFWordExtractor

main

setFetchHyperlinks

setConcatenatePhoneticRuns

getText

appendBodyElementText

appendParagraphText

org.apache.poi.xwpf.extractor
Class XWPFWordExtractor