Document Properties
Kbid235S08
Last Modified04-Feb-2020
Added to KB03-Sep-2012
Public AccessEveryone
StatusOnline
Doc TypeGuidelines, Concepts & Cookbooks
Product
  • ICM 7.6
  • ICM 7.7
  • ICM 7.8

Concept - Impex Framework (valid to 7.8)

1 Introduction

The Intershop 7 import/export framework (short term: impex framework) consists of specific Java classes, pipelets, and pipelines for importing and exporting data. The framework is used by the standard impex wizards that guide Intershop 7 users through standard import or export operations. Developers can use the framework to extend the functionality or to customize existing import and export operations. The impex framework is pipeline-oriented. Developers can use the existing pipelines to do the following:

  • Customize the pipelines by replacing single components as required
  • Use existing pipelines to develop import and export processes for custom data sets

The import functionality is based on the Simple API for XML (SAX) parser combined with JAXB for parsing XML files. For standard database imports, Intershop 7 uses the ORM (object relational mapping) layer to bulk mass data into the database. The export functionality uses the template language ISML to serialize persistent objects.

The following functionality with corresponding API is provided:

  • Logging
  • Configuration
  • Monitoring
  • ISML template execution (used for export processes)
  • Parsing, validation, bulking (used for import processes)
  • File handling
  • Filters
  • Multithreading
  • Security

1.1 Glossary

Term

Description

Impex

Short term for imp(ort) and ex(port).

Controller

Central object providing general functionality like logging, statistics, configuration, ...

2 Overview: Import Process

A typical import process involves the following steps:

  1. Parsing each element in XML file and creating transient import objects
  2. Validation and complement of transient import objects
  3. Bulking of data into the database
ConceptImpexImportOverview

There are three thread groups. The first parses the file, the second validates and complements the parsed objects, and the third thread group is responsible for writing the data into the database. Parsing, validating, and bulking run in parallel. Validating and bulking in many cases can be parallelized using multiple threads to ensure high performance during import.

Some import processes, like for products and catalogs, require duplicate execution of these three steps, because:

  • There are ring dependencies between the objects within the import file and/or database.
  • Elements at the beginning of the import file can reference other elements at the end of the import file, making the file too large to hold it transient in memory.

In the first phase, the raw data without relations are imported. Afterwards, when all objects exist in the database, the import is executed again to import the relations between imported objects.

3 Overview: Export Process

Every export process executes the following steps:

  1. Creation of an iterator of objects that need to be exported
  2. Creation of an output stream (normally simple files in shared file system)
  3. Serialization or marshaling of objects in this iterator using ISML into the output stream
ConceptImpexOverviewExport

4 Directory Structure for Import/Export Files

All import/export data files, such as sources, export files, configuration files, templates, etc., are stored in the IS_HOME/share/sites/<site>/unit/<unit>/impex directory. When uploading an import file, Intershop 7 transfers it from the source location to the appropriate location in the impex directory. Similarly, when exporting data, you can use the back office wizard to download export target files from the respective impex directory location.

Subdirectory

Description

Example

archive

To store previous import/export files.

n/a

config

Configuration files for import/export processes.

DBInit-CatalogImport.properties

export

Export target files.

ProductExport.xml

loader/*

Obsolete directory for Oracle SQL*Loader.

n/a

log

Logs created by parser, validator and bulker are stored in this directory.

ProductImport.log

src

Import data source files.

Products.xml

temp

Temporary files.

n/a

The schema files (XSD) for corresponding imports are located in IS_HOME/share/system/impex/schema.

5 Configuration

5.1 Pipeline Configuration

The complete configuration and controlling of import and export pipelines is handled by the Controller object. The Controller is created within the DetermineConfiguration pipelet, which determines the pipelet configuration and stores the Controller object in the pipeline dictionary. The Controller is the only pipeline dictionary object that is passed between pipelets to access their configuration-specific and import/export-specific services (for example, logging, progress notification, interaction, and event polling). The first action within an import/export pipeline must be the creation of the Controller object by the DetermineConfiguration pipelet. All other import/export-specific pipelets depend on it. Calling an import/export pipelet without an existing Controller causes a severe error. Configuration values are global or local (pipelet-specific) and can be stored in multiple ways:

  1. Pipelet configuration value in pipeline
    The Controller retrieves the pipelet configuration value for the given key.
  2. Global Controller property
    The Controller retrieves the property from a configuration file when the DetermineConfiguration pipelet is executed. Global means that the looked-up key does not have pipelet-specific extensions.
  3. Local (pipelet-specific) Controller property
    The Controller retrieves the property from a configuration file when the DetermineConfiguration pipelet is executed. Local means that the looked-up key is extended with the pipelet identifier; for example, if the key is LogFacility and the pipelet name is Import, the Import.LogFacilitykey is looked up.
  4. Global pipeline dictionary value
    The Controller retrieves the property from the pipeline dictionary. Global means that the looked-up key does not have pipelet-specific extensions.
  5. Local (pipelet-specific) pipeline dictionary value
    The Controller retrieves the property from the pipeline dictionary. Local means that the dictionary key is extended with a pipelet identifier, e.g., Import.DefaultImportMode.

When the Controller encounters a configuration value more than once, it uses the above order of precedence for the decision which one to use; later-found values supersede earlier ones. You should always use Controller methods to access configuration values. Doing so allows you to change the configuration at runtime. Configuration values can be a pipelet configuration property value, a Controller property value or a pipeline dictionary value. To distinguish between different pipelets of the same kind, each pipelet must be configured with a unique pipelet identifier within the pipeline descriptor.

5.2 Import Modes

The specific processes executed during an import are determined by the selected import mode. If no mode or an invalid mode is set, the mode OMIT is used by default. The following import modes can be set for importing data:

  • IGNORE
    Ignores all objects that already exist in the database; creates records only for new objects and adds them to the database. For example, if a product is specified in the import source and the product is found in the database by the import/export service, it is not modified.
  • INITIAL
    Performs no database query to find existing objects. This allows a quick import but causes an error whenever an object is imported that already exists. This mode is normally used during the DBInit process.
  • UPDATE
    Updates existing objects and creates records for new objects. Attributes and objects that do not exist in the import file are kept untouched.
  • REPLACE
    Replaces existing objects and creates records for new objects. Objects that do not exist in the import file are kept untouched. Missing attributes of existing objects (in the import file) will be removed.
  • OMIT
    Does nothing. This can be useful for tests.
  • DELETE
    Deletes the specified objects from the database.

5.2.1 Import Modes and Import Performance

The import mode has a significant impact on overall import performance. When deciding on the import mode, take the following considerations into account:

  • The mode INITIAL is the fastest. It should be used whenever the objects to be imported are not already contained in the database.
  • The UPDATE mode is faster than the REPLACE mode.

5.2.2 Setting the Import Mode

There are two ways to set the import mode:

  • In the back office, when setting up the import process
    Selecting the import mode in the back office effectively sets a respective property on the Import pipelet of the respective import pipeline.
  • As an attribute of the business object’s XML representation that precedes the pipelet property
    To enable mixed mode imports, the import mode may be specified within the XML source file as an attribute of the business object’s root tag and selected child tags. For example, a product can be imported in UPDATE mode while its category assignments are imported in REPLACE mode at the same time to remove outdated assignments to categories as well. The attribute name is import-mode and modes must be specified in uppercase, for example: import-mode = "IGNORE". Check the respective schema definition for details.

6 Logging

Import/export uses Logger objects that you create using the CreateLogger or CreateFileLogger pipelets (you can create as many Logger objects as you need). Once created, a logger is registered at the Controller object with a certain name that you can use to obtain the logger from the controller. A logger also has a log level assigned, e.g. debug, error or warning levels. Log levels can be combined using an OR operator. The base class for loggers and a ready-to-use NativeLogger (logs to stdout) and a FileLogger (logs to a file) are defined in the core framework.

7 Progress Notification

Gathering statistics for progress notification to keep track of the current position within the import process, statistics objects of type XMLStatistics are used. The statistic is accessible through the Controller and for the "outer world" through the import interactor. In general, the statistic is created by parsing the XML source and storing the number of elements. By incrementing a counter for a certain element, it is possible to get the current count of processed elements. The statistics object is created in the CollectXMLStatistics pipelet. Usually, the parse content handler is in charge of incrementing the counter.

8 Locking

To prevent multiple imports of related objects (e.g. products) into a single unit, an import process can be locked by using the LockImport pipelet. The LockImport pipelet uses the locking framework. Several import resources exist, which can be used to lock a certain import. The main resource for import is named Import. All other import resources use this resource as parent resource.

8.1 Import Specific Resources

The resources are locked in a unit-specific way by configuring the LockImport pipelet accordingly. The following resources are available for import:

  • UserImport
  • CategoryImport
  • ProductImport
  • DiscountImport
  • PriceImport
  • ProductTypeImport
  • VariationTypeImport
  • OrderImport
  • ...

8.2 General Database Resources

Some parts of import pipelines change the database content (e.g. the Import pipelet). Those parts must be locked to prevent concurrent database changes on equal tables (e.g. data replication of products vs. product import or product imports in different domains). The LockImport pipelet locks matching database resources for those tasks (meaning children of the Database resource). As a sample, the product import locks the Products resource before running the Import pipelet for product tables. The Products resource is also used by replication processes, so no replication and import process can run concurrently.

8.3 File Locking

The pipelet AcquireFileResource is responsible for locking a file virtually to avoid conflicts with other processes working on the same impex file.

8.4 Unlocking

Each LockImport pipelet must have a corresponding UnlockImport pipelet to release the locked resources.

9 Import

9.1 Import Implementations

There are three different implementations to bulk data into the database:

  • Oracle SQL*Loader
    Used in former releases of Enfinity Suite, Enfinity MultiSite, and Enfinity 1.0-2.2. Due to improved JDBC driver implementations as well as new features in the ORM cartridge, this implementation is obsolete.
  • JDBC
    Used in former release like the SQL*Loader. New improved implementation of ORM cartridge supersedes the direct usage of JDBC.
  • The ORM layer
    This implementation is used by all standard import processes of Intershop 7.

9.2 High-Level Object Diagram

The standard import processes use the ORMImportMgr to set up the import environment including queues, threads and so on. The goal is to write a pipeline calling particular import pipelets with corresponding configurations. Each import process has to configure the business object-specific XML parser ( XMLParseContentHandler), validator ( ElementValidator) as well as bulker ( ElementBulker). Normally, this is done in the according pipeline. The resulting object diagram looks like this:

ConceptImpexImportDetail

9.3 High-Level Pipeline Design

For every business object that can be imported, a processing pipeline exists. The name of the pipeline follows this pattern Process<BusinessObject>Import, for example ProcessProductImport. Each pipeline has the same basic design:

  • The sub-pipeline Process<BusinessObject>Import-Validate parses the import file.
  • The sub-pipeline Process<BusinessObject>Import-Prepare takes care of the initial data conversion, parsing, and XML validation processes.
  • The sub-pipeline Process<BusinessObject>Import-Import then executes the actual import process.

9.4 Tuning Tips

9.4.1 Configuration of Existing Import Processes

The following properties can be added to the import property file to tune the process:

Global property key

Import property key
(prefix: [DictionaryString,<pipelet id>]

Description

Default value

Location

intershop.import.bulker.orm.batchSize

<prefix>.BatchSize

Number of import elements that should be batched together to the database.

100

 

intershop.import.bulker.orm.commitSize

<prefix>.CommitSize

Number of import batches (see above) that should be committed in the database.

100

 

 

<prefix>.Validator.NumberThreads

The number of validator threads.

1

 

 

<prefix>.Bulker.NumberThreads

The number of bulker threads.

4

 

-Xms1024m
-Xmx2048m
-XX:MaxPermSize=400m
-XX:NewRatio=8

 

The size of JVM. Increasing the size of JVM may improve cache efficiency.

 

IS_HOME/bin/tomcat.sh

" Xloggc:$IS_HOME/log/gc $SERVER_NAME.log -XX:+PrintGCDetails"
"-verbose:gc -XX:+PrintGCDetails"

 

Tuning of the garbage collector.

 

IS_HOME/bin/tomcat.sh

9.4.2 Mass Data Imports

If the import process needs to bulk mass data into the database, the following aspects need to be considered:

  • The XML parser is not able to parse the XML file in one step (e.g. DOM, standard JAXB), because the application server JVM may not be huge enough. Therefore, the developer should use a SAX parser to avoid huge memory consumption.
  • The REPLACE mode should avoid DELETE followed by INSERT statements, because they are very expensive in the database. Objects should only be removed if the import file really requires it.
  • Mass data import needs to be split into data and relation import, because XML elements at the beginning of a huge import file may reference elements at the end of the same import file. These references can only be validated if the entire file is parsed at least once. Since not all elements can be cached in memory, the standard import writes them into the database.
  • Avoid changing a database object twice. This means that most import processes parse the import file twice to separate the data from relation bulking. The relation import should not import general data.
  • If a huge amount of data is imported into an empty table, the import process may slow down, because the database statistics would be wrong after e.g. 10.000 import objects. That is why the base class for each bulker ElementBulkerORM automatically triggers analyzing database tables during the import process.

10 Export

All standard export processes use the impex framework to provide basic functionality including ISML templating, logging as well as monitoring. So, each export process consists at least of a pipeline with the following start nodes:

  • Prepare:
    1. Create the controller object -> pipelet DetermineConfiguration
    2. Create the file logger -> pipelet CreateFileLogger
    3. Create the pageable -> pipelet GetPageable
  • RunExport:
    1. Open the export file -> pipelet OpenFile
    2. Open the filters (e.g. formatting) -> pipelet OpenFilter
    3. ISML template processing -> pipelet Export
    4. Close filters -> pipelet CloseFilter
    5. Close export file -> pipelet CloseFile
  • CleanUp:
    1. Close loggers -> pipelet CloseLoggers

Due to serialization into the file system, the multithreading approach does not make sense, because the shared file system is often the bottleneck.

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Customer Support
Knowledge Base
Product Resources
Support Tickets