org.apache.poi.hwpf
Class HWPFDocumentCore

java.lang.Object
  extended by org.apache.poi.POIDocument
      extended by org.apache.poi.hwpf.HWPFDocumentCore
All Implemented Interfaces:
java.io.Closeable
Direct Known Subclasses:
HWPFDocument, HWPFOldDocument

public abstract class HWPFDocumentCore
extends POIDocument

This class holds much of the core of a Word document, but without some of the table structure information. You generally want to work with one of HWPFDocument or HWPFOldDocument


Field Summary
protected  CHPBinTable _cbt
          Contains formatting properties for text
protected  FileInformationBlock _fib
          The FIB
protected  FontTable _ft
          Holds fonts for this document.
protected  ListTables _lt
          Hold list tables
protected  byte[] _mainStream
          main document stream buffer
protected  ObjectPoolImpl _objectPool
          Holds OLE2 objects
protected  PAPBinTable _pbt
          Contains formatting properties for paragraphs
protected  StyleSheet _ss
          Holds styles for this document.
protected  SectionTable _st
          Contains formatting properties for sections.
protected static int FIB_BASE_LEN
          Size of the not encrypted part of the FIB
protected static int RC4_REKEYING_INTERVAL
          [MS-DOC] 2.2.6.2/3 Office Binary Document ...
protected static java.lang.String STREAM_OBJECT_POOL
           
protected static java.lang.String STREAM_TABLE_0
           
protected static java.lang.String STREAM_TABLE_1
           
protected static java.lang.String STREAM_WORD_DOCUMENT
           
 
Constructor Summary
protected HWPFDocumentCore()
           
  HWPFDocumentCore(DirectoryNode directory)
          This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default.
  HWPFDocumentCore(java.io.InputStream istream)
          This constructor loads a Word document from an InputStream.
  HWPFDocumentCore(POIFSFileSystem pfilesystem)
          This constructor loads a Word document from a POIFSFileSystem
 
Method Summary
 CHPBinTable getCharacterTable()
           
protected  byte[] getDocumentEntryBytes(java.lang.String name, int encryptionOffset, int len)
          Reads OLE Stream into byte array - if an EncryptionInfo is available, decrypt the bytes starting at encryptionOffset.
 java.lang.String getDocumentText()
          Returns document text, i.e.
 EncryptionInfo getEncryptionInfo()
           
 FileInformationBlock getFileInformationBlock()
           
 FontTable getFontTable()
           
 ListTables getListTables()
           
 byte[] getMainStream()
           
 ObjectsPool getObjectsPool()
           
abstract  Range getOverallRange()
          Returns the range that covers all text in the file, including main text, footnotes, headers and comments
 PAPBinTable getParagraphTable()
           
abstract  Range getRange()
          Returns the range which covers the whole of the document, but excludes any headers and footers.
 SectionTable getSectionTable()
           
 StyleSheet getStyleSheet()
           
abstract  java.lang.StringBuilder getText()
          Internal method to access document text
abstract  TextPieceTable getTextTable()
           
protected  void updateEncryptionInfo()
           
static POIFSFileSystem verifyAndBuildPOIFS(java.io.InputStream istream)
          Takes an InputStream, verifies that it's not RTF or PDF, builds a POIFSFileSystem from it, and returns that.
 
Methods inherited from class org.apache.poi.POIDocument
clearDirectory, close, createInformationProperties, getDirectory, getDocumentSummaryInformation, getEncryptedPropertyStreamName, getPropertySet, getPropertySet, getSummaryInformation, initDirectory, readProperties, replaceDirectory, validateInPlaceWritePossible, write, write, write, writeProperties, writeProperties, writeProperties, writePropertySet
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STREAM_OBJECT_POOL

protected static final java.lang.String STREAM_OBJECT_POOL
See Also:
Constant Field Values

STREAM_WORD_DOCUMENT

protected static final java.lang.String STREAM_WORD_DOCUMENT
See Also:
Constant Field Values

STREAM_TABLE_0

protected static final java.lang.String STREAM_TABLE_0
See Also:
Constant Field Values

STREAM_TABLE_1

protected static final java.lang.String STREAM_TABLE_1
See Also:
Constant Field Values

FIB_BASE_LEN

protected static final int FIB_BASE_LEN
Size of the not encrypted part of the FIB

See Also:
Constant Field Values

RC4_REKEYING_INTERVAL

protected static final int RC4_REKEYING_INTERVAL
[MS-DOC] 2.2.6.2/3 Office Binary Document ... Encryption: "... The block number MUST be set to zero at the beginning of the stream and MUST be incremented at each 512 byte boundary. ..."

See Also:
Constant Field Values

_objectPool

protected ObjectPoolImpl _objectPool
Holds OLE2 objects


_fib

protected FileInformationBlock _fib
The FIB


_ss

protected StyleSheet _ss
Holds styles for this document.


_cbt

protected CHPBinTable _cbt
Contains formatting properties for text


_pbt

protected PAPBinTable _pbt
Contains formatting properties for paragraphs


_st

protected SectionTable _st
Contains formatting properties for sections.


_ft

protected FontTable _ft
Holds fonts for this document.


_lt

protected ListTables _lt
Hold list tables


_mainStream

protected byte[] _mainStream
main document stream buffer

Constructor Detail

HWPFDocumentCore

protected HWPFDocumentCore()

HWPFDocumentCore

public HWPFDocumentCore(java.io.InputStream istream)
                 throws java.io.IOException
This constructor loads a Word document from an InputStream.

Parameters:
istream - The InputStream that contains the Word document.
Throws:
java.io.IOException - If there is an unexpected IOException from the passed in InputStream.

HWPFDocumentCore

public HWPFDocumentCore(POIFSFileSystem pfilesystem)
                 throws java.io.IOException
This constructor loads a Word document from a POIFSFileSystem

Parameters:
pfilesystem - The POIFSFileSystem that contains the Word document.
Throws:
java.io.IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.

HWPFDocumentCore

public HWPFDocumentCore(DirectoryNode directory)
                 throws java.io.IOException
This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default. Used typically to open embeded documents.

Parameters:
directory - The DirectoryNode that contains the Word document.
Throws:
java.io.IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.
Method Detail

verifyAndBuildPOIFS

public static POIFSFileSystem verifyAndBuildPOIFS(java.io.InputStream istream)
                                           throws java.io.IOException
Takes an InputStream, verifies that it's not RTF or PDF, builds a POIFSFileSystem from it, and returns that.

Throws:
java.io.IOException

getRange

public abstract Range getRange()
Returns the range which covers the whole of the document, but excludes any headers and footers.


getOverallRange

public abstract Range getOverallRange()
Returns the range that covers all text in the file, including main text, footnotes, headers and comments


getDocumentText

public java.lang.String getDocumentText()
Returns document text, i.e. text information from all text pieces, including OLE descriptions and field codes


getText

@Internal
public abstract java.lang.StringBuilder getText()
Internal method to access document text


getCharacterTable

public CHPBinTable getCharacterTable()

getParagraphTable

public PAPBinTable getParagraphTable()

getSectionTable

public SectionTable getSectionTable()

getStyleSheet

public StyleSheet getStyleSheet()

getListTables

public ListTables getListTables()

getFontTable

public FontTable getFontTable()

getFileInformationBlock

public FileInformationBlock getFileInformationBlock()

getObjectsPool

public ObjectsPool getObjectsPool()

getTextTable

public abstract TextPieceTable getTextTable()

getMainStream

@Internal
public byte[] getMainStream()

getEncryptionInfo

public EncryptionInfo getEncryptionInfo()
                                 throws java.io.IOException
Overrides:
getEncryptionInfo in class POIDocument
Returns:
the encryption info if the document is encrypted, otherwise null
Throws:
java.io.IOException

updateEncryptionInfo

protected void updateEncryptionInfo()

getDocumentEntryBytes

protected byte[] getDocumentEntryBytes(java.lang.String name,
                                       int encryptionOffset,
                                       int len)
                                throws java.io.IOException
Reads OLE Stream into byte array - if an EncryptionInfo is available, decrypt the bytes starting at encryptionOffset. If encryptionOffset = -1, then do not try to decrypt the bytes

Parameters:
name - the name of the stream
encryptionOffset - the offset from which to start decrypting, use -1 for no decryption
len - length of the bytes to be read, use Integer.MAX_VALUE for all bytes
Returns:
the read bytes
Throws:
java.io.IOException - if the stream can't be found


Copyright 2017 The Apache Software Foundation or its licensors, as applicable.