writer2latex.xmerge
Class OfficeDocument

java.lang.Object
  extended by writer2latex.xmerge.OfficeDocument
All Implemented Interfaces:
OutputFile, Document, OfficeConstants

public class OfficeDocument
extends java.lang.Object
implements Document, OfficeConstants

An implementation of Document for StarOffice documents.


Field Summary
private  org.w3c.dom.Document contentDoc
          DOM Document of content.xml.
private  java.lang.String documentName
           
private  java.util.Map embeddedObjects
          Collection to keep track of the embedded objects in the document.
private static javax.xml.parsers.DocumentBuilderFactory factory
          Factory for DocumentBuilder objects.
private  java.lang.String fileName
           
private  org.w3c.dom.Document manifestDoc
          DOM Docuemtn of META-INF/manifest.xml.
private  org.w3c.dom.Document metaDoc
          DOM Document of meta.xml.
private  org.w3c.dom.Document settingsDoc
          DOM Document of settings.xml.
private  org.w3c.dom.Document styleDoc
          DOM Document of content.xml.
private  OfficeZip zip
          OfficeZip object to store zip contents from read InputStream.
 
Fields inherited from interface writer2latex.xmerge.OfficeConstants
ATTRIBUTE_CONFIG_NAME, ATTRIBUTE_CONFIG_TYPE, ATTRIBUTE_DEFAULT_CELL_STYLE, ATTRIBUTE_FO_FONT_FAMILY, ATTRIBUTE_FO_FONT_FAMILY_GENERIC, ATTRIBUTE_MANIFEST_FILE_PATH, ATTRIBUTE_MANIFEST_FILE_TYPE, ATTRIBUTE_OFFICE_CLASS, ATTRIBUTE_SPACE_COUNT, ATTRIBUTE_STYLE_FONT_PITCH, ATTRIBUTE_STYLE_NAME, ATTRIBUTE_TABLE_BASE_CELL_ADDRESS, ATTRIBUTE_TABLE_BOOLEAN_VALUE, ATTRIBUTE_TABLE_CELL_RANGE_ADDRESS, ATTRIBUTE_TABLE_CURRENCY, ATTRIBUTE_TABLE_DATE_VALUE, ATTRIBUTE_TABLE_EXPRESSION, ATTRIBUTE_TABLE_FORMULA, ATTRIBUTE_TABLE_NAME, ATTRIBUTE_TABLE_NUM_COLUMNS_REPEATED, ATTRIBUTE_TABLE_NUM_ROWS_REPEATED, ATTRIBUTE_TABLE_STRING_VALUE, ATTRIBUTE_TABLE_STYLE_NAME, ATTRIBUTE_TABLE_TIME_VALUE, ATTRIBUTE_TABLE_VALUE, ATTRIBUTE_TABLE_VALUE_TYPE, ATTRIBUTE_TEXT_STYLE_NAME, CELLTYPE_BOOLEAN, CELLTYPE_CURRENCY, CELLTYPE_DATE, CELLTYPE_FLOAT, CELLTYPE_PERCENT, CELLTYPE_STRING, CELLTYPE_TIME, STC_MIME_TYPE, STI_MIME_TYPE, STW_MIME_TYPE, SXC_FILE_EXTENSION, SXC_MIME_TYPE, SXC_TYPE, SXD_MIME_TYPE, SXG_MIME_TYPE, SXI_MIME_TYPE, SXM_MIME_TYPE, SXW_FILE_EXTENSION, SXW_MIME_TYPE, SXW_TYPE, TAG_BOOKMARK, TAG_BOOKMARK_START, TAG_CONFIG_ITEM, TAG_CONFIG_ITEM_MAP_ENTRY, TAG_CONFIG_ITEM_MAP_INDEXED, TAG_CONFIG_ITEM_MAP_NAMED, TAG_CONFIG_ITEM_SET, TAG_HEADING, TAG_HYPERLINK, TAG_LINE_BREAK, TAG_LIST_HEADER, TAG_LIST_ITEM, TAG_MANIFEST_FILE, TAG_MANIFEST_ROOT, TAG_NAMED_EXPRESSIONS, TAG_OFFICE_AUTOMATIC_STYLES, TAG_OFFICE_BODY, TAG_OFFICE_DOCUMENT, TAG_OFFICE_DOCUMENT_CONTENT, TAG_OFFICE_DOCUMENT_META, TAG_OFFICE_DOCUMENT_SETTINGS, TAG_OFFICE_DOCUMENT_STYLES, TAG_OFFICE_FONT_DECLS, TAG_OFFICE_FONT_FACE_DECLS, TAG_OFFICE_MASTER_STYLES, TAG_OFFICE_META, TAG_OFFICE_SETTINGS, TAG_OFFICE_STYLES, TAG_ORDERED_LIST, TAG_PARAGRAPH, TAG_SPACE, TAG_SPAN, TAG_STYLE_FONT_DECL, TAG_TAB_STOP, TAG_TABLE, TAG_TABLE_CELL, TAG_TABLE_COLUMN, TAG_TABLE_NAMED_EXPRESSION, TAG_TABLE_NAMED_RANGE, TAG_TABLE_ROW, TAG_TABLE_SCENARIO, TAG_TEXT, TAG_TEXT_AUTHOR_INITIALS, TAG_TEXT_CREATION_TIME, TAG_TEXT_DATE, TAG_TEXT_EXPRESSION, TAG_TEXT_PAGE_COUNT, TAG_TEXT_PAGE_NUMBER, TAG_TEXT_PAGE_VARIABLE_GET, TAG_TEXT_SEQUENCE, TAG_TEXT_SUBJECT, TAG_TEXT_TEXT_INPUT, TAG_TEXT_TIME, TAG_TEXT_TITLE, TAG_TEXT_USER_FIELD_GET, TAG_TEXT_VARIABLE_GET, TAG_TEXT_VARIABLE_INPUT, TAG_TEXT_VARIABLE_SET, TAG_UNORDERED_LIST
 
Constructor Summary
OfficeDocument(java.lang.String name)
          Default constructor.
OfficeDocument(java.lang.String name, boolean namespaceAware, boolean validating)
          Constructor with arguments to set namespaceAware and validating flags.
 
Method Summary
 void addEmbeddedObject(EmbeddedObject embObj)
          Adds a new embedded object to the document.
private  org.w3c.dom.Document createDOM(java.lang.String rootName)
          Creates a new DOM Document containing minimum OpenOffice XML tags.
private  org.w3c.dom.Document createSettingsDOM(java.lang.String rootName)
          Creates a new DOM Document containing minimum OpenOffice XML tags.
(package private) static byte[] docToBytes(org.w3c.dom.Document doc)
          Write out a org.w3c.dom.Document object into a byte array.
 org.w3c.dom.Document getContentDOM()
          Return a DOM Document object of the content.xml file.
protected  java.lang.String getDocumentMimeType()
          Method to return the MIME type of the document.
 EmbeddedObject getEmbeddedObject(java.lang.String name)
          Returns the embedded object corresponding to the name provided.
 java.util.Iterator getEmbeddedObjects()
          Returns all the embedded objects (graphics, formulae, etc.) present in this document.
protected  java.lang.String getFileExtension()
          Returns the file extension for this type of Document.
 java.lang.String getFileName()
          Return the file name of the Document, possibly with the standard extension.
 org.w3c.dom.Document getMetaDOM()
          Return a DOM Document object of the meta.xml file.
 java.lang.String getName()
          Return the name of the Document.
protected  java.lang.String getOfficeClassAttribute()
          Return the office:class attribute value.
 org.w3c.dom.Document getSettingsDOM()
          Return a DOM Document object of the settings.xml file.
 org.w3c.dom.Document getStyleDOM()
          Return a DOM Document object of the style.xml file.
private static java.io.Reader hack(java.io.InputStream is)
          Hacked code to filter
 void initContentDOM()
          Initializes a new DOM Document with the content containing minimum OpenOffice XML tags.
private  void initManifestDOM()
          Method to create the initial entries in the manifest.xml file stored in an SX?
 void initSettingsDOM()
          Initializes a new DOM Document with the content containing minimum OpenOffice XML tags.
 void initStyleDOM()
          Initializes a new DOM Document with styles containing minimum OpenOffice XML tags.
 boolean isPackageFormat()
          Package or flat format?
(package private) static org.w3c.dom.Document parse(javax.xml.parsers.DocumentBuilder builder, byte[] bytes)
          Parse given byte array into a DOM Document object using the DocumentBuilder object.
 void read(java.io.InputStream is)
          Read the Office Document from the given InputStream.
 void read(java.io.InputStream is, boolean isZip)
          Read the Office Document from the given InputStream.
private  void readZip(java.io.InputStream is)
           
private static java.io.Reader secondHack(java.io.InputStream is)
          Transform the InputStream to a Reader Stream.
 void setContentDOM(org.w3c.dom.Node newDom)
          Sets the content tree of the document.
 void setMetaDOM(org.w3c.dom.Node newDom)
          Sets the meta tree of the document.
 void setSettingsDOM(org.w3c.dom.Node newDom)
          Sets the settings tree of the document.
 void setStyleDOM(org.w3c.dom.Node newDom)
          Sets the style tree of the document.
private  java.lang.String trimDocumentName(java.lang.String name)
          Removes the file extension from the Document name.
 void write(java.io.OutputStream os)
          Write out Office ZIP file format.
 void write(java.io.OutputStream os, boolean isZip)
          Write out Office ZIP file format.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

factory

private static javax.xml.parsers.DocumentBuilderFactory factory
Factory for DocumentBuilder objects.


contentDoc

private org.w3c.dom.Document contentDoc
DOM Document of content.xml.


metaDoc

private org.w3c.dom.Document metaDoc
DOM Document of meta.xml.


settingsDoc

private org.w3c.dom.Document settingsDoc
DOM Document of settings.xml.


styleDoc

private org.w3c.dom.Document styleDoc
DOM Document of content.xml.


manifestDoc

private org.w3c.dom.Document manifestDoc
DOM Docuemtn of META-INF/manifest.xml.


documentName

private java.lang.String documentName

fileName

private java.lang.String fileName

zip

private OfficeZip zip
OfficeZip object to store zip contents from read InputStream. Note that this member will still be null if it was initialized using a template file instead of reading from a StarOffice zipped XML file.


embeddedObjects

private java.util.Map embeddedObjects
Collection to keep track of the embedded objects in the document.

Constructor Detail

OfficeDocument

public OfficeDocument(java.lang.String name)
Default constructor.

Parameters:
name - Document name.

OfficeDocument

public OfficeDocument(java.lang.String name,
                      boolean namespaceAware,
                      boolean validating)
Constructor with arguments to set namespaceAware and validating flags.

Parameters:
name - Document name (may or may not contain extension).
namespaceAware - Value for namespaceAware flag.
validating - Value for validating flag.
Method Detail

trimDocumentName

private java.lang.String trimDocumentName(java.lang.String name)
Removes the file extension from the Document name.

Parameters:
name - Full Document name with extension.
Returns:
Name of Document without the extension.

isPackageFormat

public boolean isPackageFormat()
Package or flat format?

Returns:
true if the document is in package format, false if it's flat xml

getContentDOM

public org.w3c.dom.Document getContentDOM()
Return a DOM Document object of the content.xml file. Note that a content DOM is not created when the constructor is called. So, either the read method or the initContentDOM method will need to be called ahead on this object before calling this method.

Returns:
DOM Document object.

getMetaDOM

public org.w3c.dom.Document getMetaDOM()
Return a DOM Document object of the meta.xml file. Note that a content DOM is not created when the constructor is called. So, either the read method or the initContentDOM method will need to be called ahead on this object before calling this method.

Returns:
DOM Document object.

getSettingsDOM

public org.w3c.dom.Document getSettingsDOM()
Return a DOM Document object of the settings.xml file. Note that a content DOM is not created when the constructor is called. So, either the read method or the initContentDOM method will need to be called ahead on this object before calling this method.

Returns:
DOM Document object.

setContentDOM

public void setContentDOM(org.w3c.dom.Node newDom)
Sets the content tree of the document.

Parameters:
newDom - Node containing the new content tree.

setMetaDOM

public void setMetaDOM(org.w3c.dom.Node newDom)
Sets the meta tree of the document.

Parameters:
newDom - Node containing the new meta tree.

setSettingsDOM

public void setSettingsDOM(org.w3c.dom.Node newDom)
Sets the settings tree of the document.

Parameters:
newDom - Node containing the new settings tree.

setStyleDOM

public void setStyleDOM(org.w3c.dom.Node newDom)
Sets the style tree of the document.

Parameters:
newDom - Node containing the new style tree.

getStyleDOM

public org.w3c.dom.Document getStyleDOM()
Return a DOM Document object of the style.xml file. Note that this may return null if there is no style DOM. Note that a style DOM is not created when the constructor is called. Depending on the InputStream, a read method may or may not build a style DOM. When creating a new style DOM, call the initStyleDOM method first.

Returns:
DOM Document object.

getName

public java.lang.String getName()
Return the name of the Document.

Specified by:
getName in interface Document
Returns:
The name of Document.

getFileName

public java.lang.String getFileName()
Return the file name of the Document, possibly with the standard extension.

Specified by:
getFileName in interface OutputFile
Returns:
The file name of Document.

getFileExtension

protected java.lang.String getFileExtension()
Returns the file extension for this type of Document.

Returns:
The file extension of Document.

getEmbeddedObjects

public java.util.Iterator getEmbeddedObjects()
Returns all the embedded objects (graphics, formulae, etc.) present in this document.

Returns:
An Iterator of EmbeddedObject objects.

getEmbeddedObject

public EmbeddedObject getEmbeddedObject(java.lang.String name)
Returns the embedded object corresponding to the name provided. The name should be stripped of any preceding path characters, such as '/', '.' or '#'.

Parameters:
name - The name of the embedded object to retrieve.
Returns:
An EmbeddedObject instance representing the named object.

addEmbeddedObject

public void addEmbeddedObject(EmbeddedObject embObj)
Adds a new embedded object to the document.

Parameters:
embObj - An instance of EmbeddedObject.

read

public void read(java.io.InputStream is)
          throws java.io.IOException
Read the Office Document from the given InputStream. FIX3 (HJ): Perform simple type detection to determine package or flat format

Specified by:
read in interface Document
Parameters:
is - Office document InputStream.
Throws:
java.io.IOException - If any I/O error occurs.

readZip

private void readZip(java.io.InputStream is)
              throws java.io.IOException
Throws:
java.io.IOException

read

public void read(java.io.InputStream is,
                 boolean isZip)
          throws java.io.IOException
Read the Office Document from the given InputStream.

Parameters:
is - Office document InputStream.
isZip - boolean Identifies whether a file is zipped or not
Throws:
java.io.IOException - If any I/O error occurs.

parse

static org.w3c.dom.Document parse(javax.xml.parsers.DocumentBuilder builder,
                                  byte[] bytes)
                           throws org.xml.sax.SAXException,
                                  java.io.IOException
Parse given byte array into a DOM Document object using the DocumentBuilder object.

Parameters:
builder - DocumentBuilder object for parsing.
bytes - byte array for parsing.
Returns:
Resulting DOM Document object.
Throws:
org.xml.sax.SAXException - If any parsing error occurs.
java.io.IOException

getDocumentMimeType

protected java.lang.String getDocumentMimeType()
Method to return the MIME type of the document.

Returns:
String The document's MIME type.

write

public void write(java.io.OutputStream os)
           throws java.io.IOException
Write out Office ZIP file format.

Specified by:
write in interface OutputFile
Parameters:
os - XML OutputStream.
Throws:
java.io.IOException - If any I/O error occurs.

write

public void write(java.io.OutputStream os,
                  boolean isZip)
           throws java.io.IOException
Write out Office ZIP file format.

Parameters:
os - XML OutputStream.
isZip - boolean
Throws:
java.io.IOException - If any I/O error occurs.

docToBytes

static byte[] docToBytes(org.w3c.dom.Document doc)
                  throws java.io.IOException

Write out a org.w3c.dom.Document object into a byte array.

TODO: remove dependency on com.sun.xml.tree.XmlDocument package!

Parameters:
Document - DOM Document object.
Returns:
byte array of DOM Document object.
Throws:
java.io.IOException - If any I/O error occurs.

initContentDOM

public final void initContentDOM()
                          throws java.io.IOException
Initializes a new DOM Document with the content containing minimum OpenOffice XML tags.

Throws:
java.io.IOException - If any I/O error occurs.

initSettingsDOM

public final void initSettingsDOM()
                           throws java.io.IOException
Initializes a new DOM Document with the content containing minimum OpenOffice XML tags.

Throws:
java.io.IOException - If any I/O error occurs.

initStyleDOM

public final void initStyleDOM()
                        throws java.io.IOException
Initializes a new DOM Document with styles containing minimum OpenOffice XML tags.

Throws:
java.io.IOException - If any I/O error occurs.

createSettingsDOM

private final org.w3c.dom.Document createSettingsDOM(java.lang.String rootName)
                                              throws java.io.IOException

Creates a new DOM Document containing minimum OpenOffice XML tags.

This method uses the subclass getOfficeClassAttribute method to get the attribute for office:class.

Parameters:
rootName - root name of Document.
Throws:
java.io.IOException - If any I/O error occurs.

createDOM

private final org.w3c.dom.Document createDOM(java.lang.String rootName)
                                      throws java.io.IOException

Creates a new DOM Document containing minimum OpenOffice XML tags.

This method uses the subclass getOfficeClassAttribute method to get the attribute for office:class.

Parameters:
rootName - root name of Document.
Throws:
java.io.IOException - If any I/O error occurs.

getOfficeClassAttribute

protected java.lang.String getOfficeClassAttribute()
Return the office:class attribute value.

Returns:
The attribute value.

hack

private static java.io.Reader hack(java.io.InputStream is)
                            throws java.io.IOException

Hacked code to filter tag before sending stream to parser.

This hacked code needs to be changed later on.

Issue: using current jaxp1.0 parser, there is no way to turn off processing of dtds. Current set of dtds have bugs, processing them will throw exceptions.

This is a simple hack that assumes the whole tag are all in the same line. This is sufficient for current StarOffice 6.0 generated XML files. Since this hack really needs to go away, I don't want to spend too much time in making it a perfect hack.

FIX (HJ): Removed requirement for DOCTYPE to be in one line FIX (HJ): No longer removes newlines

Parameters:
is - InputStream to be filtered.
Returns:
Reader value without the tag.
Throws:
java.io.IOException - If any I/O error occurs.

secondHack

private static java.io.Reader secondHack(java.io.InputStream is)
                                  throws java.io.IOException

Transform the InputStream to a Reader Stream.

This hacked code needs to be changed later on.

Issue: the new oasis input file stream means that the old input stream fails. see #i33702#

Parameters:
is - InputStream to be filtered.
Returns:
Reader value of the InputStream().
Throws:
java.io.IOException - If any I/O error occurs.

initManifestDOM

private void initManifestDOM()
                      throws java.io.IOException
Method to create the initial entries in the manifest.xml file stored in an SX? file.

Throws:
java.io.IOException