org.apache.poi.xssf.extractor
Class XSSFEventBasedExcelExtractor

java.lang.Object
  extended by org.apache.poi.POITextExtractor
      extended by org.apache.poi.POIXMLTextExtractor
          extended by org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor
All Implemented Interfaces:
java.io.Closeable, ExcelExtractor
Direct Known Subclasses:
XSSFBEventBasedExcelExtractor

public class XSSFEventBasedExcelExtractor
extends POIXMLTextExtractor
implements ExcelExtractor

Implementation of a text extractor from OOXML Excel files that uses SAX event based parsing.


Nested Class Summary
protected  class XSSFEventBasedExcelExtractor.SheetTextExtractor
           
 
Constructor Summary
XSSFEventBasedExcelExtractor(OPCPackage container)
           
XSSFEventBasedExcelExtractor(java.lang.String path)
           
 
Method Summary
 void close()
          Allows to free resources of the Extractor as soon as it is not needed any more.
 POIXMLProperties.CoreProperties getCoreProperties()
          Returns the core document properties
 POIXMLProperties.CustomProperties getCustomProperties()
          Returns the custom document properties
 POIXMLProperties.ExtendedProperties getExtendedProperties()
          Returns the extended document properties
 boolean getFormulasNotResults()
           
 boolean getIncludeCellComments()
           
 boolean getIncludeHeadersFooters()
           
 boolean getIncludeSheetNames()
           
 boolean getIncludeTextBoxes()
           
 java.util.Locale getLocale()
           
 OPCPackage getPackage()
          Returns the opened OPCPackage container.
 java.lang.String getText()
          Processes the file and returns the text
static void main(java.lang.String[] args)
           
 void processSheet(XSSFSheetXMLHandler.SheetContentsHandler sheetContentsExtractor, StylesTable styles, CommentsTable comments, ReadOnlySharedStringsTable strings, java.io.InputStream sheetInputStream)
          Processes the given sheet
 void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
          Concatenate text from <rPh> text elements in SharedStringsTable Default is true;
 void setFormulasNotResults(boolean formulasNotResults)
          Should we return the formula itself, and not the result it produces? Default is false
 void setIncludeCellComments(boolean includeCellComments)
          Should cell comments be included? Default is false
 void setIncludeHeadersFooters(boolean includeHeadersFooters)
          Should headers and footers be included? Default is true
 void setIncludeSheetNames(boolean includeSheetNames)
          Should sheet names be included? Default is true
 void setIncludeTextBoxes(boolean includeTextBoxes)
          Should text from textboxes be included? Default is true
 void setLocale(java.util.Locale locale)
           
 
Methods inherited from class org.apache.poi.POIXMLTextExtractor
checkMaxTextSize, getDocument, getMetadataTextExtractor
 
Methods inherited from class org.apache.poi.POITextExtractor
setFilesystem
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XSSFEventBasedExcelExtractor

public XSSFEventBasedExcelExtractor(java.lang.String path)
                             throws org.apache.xmlbeans.XmlException,
                                    OpenXML4JException,
                                    java.io.IOException
Throws:
org.apache.xmlbeans.XmlException
OpenXML4JException
java.io.IOException

XSSFEventBasedExcelExtractor

public XSSFEventBasedExcelExtractor(OPCPackage container)
                             throws org.apache.xmlbeans.XmlException,
                                    OpenXML4JException,
                                    java.io.IOException
Throws:
org.apache.xmlbeans.XmlException
OpenXML4JException
java.io.IOException
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

setIncludeSheetNames

public void setIncludeSheetNames(boolean includeSheetNames)
Should sheet names be included? Default is true

Specified by:
setIncludeSheetNames in interface ExcelExtractor
Parameters:
includeSheetNames - true if the sheet names should be included

getIncludeSheetNames

public boolean getIncludeSheetNames()
Returns:
whether to include sheet names
Since:
3.16-beta3

setFormulasNotResults

public void setFormulasNotResults(boolean formulasNotResults)
Should we return the formula itself, and not the result it produces? Default is false

Specified by:
setFormulasNotResults in interface ExcelExtractor
Parameters:
formulasNotResults - true if the formula itself is returned

getFormulasNotResults

public boolean getFormulasNotResults()
Returns:
whether to include formulas but not results
Since:
3.16-beta3

setIncludeHeadersFooters

public void setIncludeHeadersFooters(boolean includeHeadersFooters)
Should headers and footers be included? Default is true

Specified by:
setIncludeHeadersFooters in interface ExcelExtractor
Parameters:
includeHeadersFooters - true if headers and footers should be included

getIncludeHeadersFooters

public boolean getIncludeHeadersFooters()
Returns:
whether or not to include headers and footers
Since:
3.16-beta3

setIncludeTextBoxes

public void setIncludeTextBoxes(boolean includeTextBoxes)
Should text from textboxes be included? Default is true


getIncludeTextBoxes

public boolean getIncludeTextBoxes()
Returns:
whether or not to extract textboxes
Since:
3.16-beta3

setIncludeCellComments

public void setIncludeCellComments(boolean includeCellComments)
Should cell comments be included? Default is false

Specified by:
setIncludeCellComments in interface ExcelExtractor
Parameters:
includeCellComments - true if cell comments should be included

getIncludeCellComments

public boolean getIncludeCellComments()
Returns:
whether cell comments should be included
Since:
3.16-beta3

setConcatenatePhoneticRuns

public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Concatenate text from <rPh> text elements in SharedStringsTable Default is true;

Parameters:
concatenatePhoneticRuns -

setLocale

public void setLocale(java.util.Locale locale)

getLocale

public java.util.Locale getLocale()
Returns:
locale
Since:
3.16-beta3

getPackage

public OPCPackage getPackage()
Returns the opened OPCPackage container.

Overrides:
getPackage in class POIXMLTextExtractor
Returns:
the opened OPCPackage

getCoreProperties

public POIXMLProperties.CoreProperties getCoreProperties()
Returns the core document properties

Overrides:
getCoreProperties in class POIXMLTextExtractor
Returns:
the core document properties

getExtendedProperties

public POIXMLProperties.ExtendedProperties getExtendedProperties()
Returns the extended document properties

Overrides:
getExtendedProperties in class POIXMLTextExtractor
Returns:
the extended document properties

getCustomProperties

public POIXMLProperties.CustomProperties getCustomProperties()
Returns the custom document properties

Overrides:
getCustomProperties in class POIXMLTextExtractor
Returns:
the custom document properties

processSheet

public void processSheet(XSSFSheetXMLHandler.SheetContentsHandler sheetContentsExtractor,
                         StylesTable styles,
                         CommentsTable comments,
                         ReadOnlySharedStringsTable strings,
                         java.io.InputStream sheetInputStream)
                  throws java.io.IOException,
                         org.xml.sax.SAXException
Processes the given sheet

Throws:
java.io.IOException
org.xml.sax.SAXException

getText

public java.lang.String getText()
Processes the file and returns the text

Specified by:
getText in interface ExcelExtractor
Specified by:
getText in class POITextExtractor
Returns:
All the text from the document

close

public void close()
           throws java.io.IOException
Description copied from class: POITextExtractor
Allows to free resources of the Extractor as soon as it is not needed any more. This may include closing open file handles and freeing memory. The Extractor cannot be used after close has been called.

Specified by:
close in interface java.io.Closeable
Overrides:
close in class POIXMLTextExtractor
Throws:
java.io.IOException


Copyright 2017 The Apache Software Foundation or its licensors, as applicable.