The Intershop 7 import/export framework (short term: ImpEx framework) consists of specific Java classes, pipelets, and pipelines for importing and exporting data. The framework is used by the standard ImpEx wizards that guide Intershop 7 users through standard import or export operations. Developers can use the framework to extend the functionality or to customize existing import and export operations. The ImpEx framework is pipeline-oriented. Developers can use the existing pipelines to do the following:
The import functionality is based on the Simple API for XML (SAX) parser combined with JAXB for parsing XML files. For standard database imports, Intershop 7 uses the ORM (object relational mapping) layer to bulk mass data into the database. The export functionality uses the template language ISML to serialize persistent objects.
The following functionality with corresponding API is provided:
Short term for import and export.
Central object providing general functionality like logging, statistics, configuration, ...
A typical import process involves the following steps:
There are three thread groups. The first parses the file, the second validates and complements the parsed objects, and the third thread group is responsible for writing the data into the database. Parsing, validating, and bulking run in parallel. Validating and bulking in many cases can be parallelized using multiple threads to ensure high performance during import.
Some import processes, like for products and catalogs, require duplicate execution of these three steps, because:
In the first phase, the raw data without relations are imported. Afterwards, when all objects exist in the database, the import is executed again to import the relations between imported objects.
Every export process executes the following steps:
All import/export data files, such as sources, export files, configuration files, templates, etc., are stored in the IS_HOME/share/sites/<site>/unit/<unit>/impex directory. When uploading an import file, Intershop 7 transfers it from the source location to the appropriate location in the ImpEx directory. Similarly, when exporting data, you can use the back office wizard to download export target files from the respective ImpEx directory location.
To store previous import/export files.
Configuration files for import/export processes.
Export target files.
Obsolete directory for Oracle SQL*Loader.
Logs created by parser, validator, and bulker are stored in this directory.
Import data source files.
The schema files (XSD) for corresponding imports are located in IS_HOME/share/system/impex/schema.
The complete configuration and controlling of import and export pipelines is handled by the
Controller object. The
Controller is created within the
DetermineConfiguration pipelet, which determines the pipelet configuration and stores the
Controller object in the pipeline dictionary. The
Controller is the only pipeline dictionary object that is passed between pipelets to access their configuration-specific and import/export-specific services (for example, logging, progress notification, interaction, and event polling). The first action within an import/export pipeline must be the creation of the
Controller object by the
DetermineConfiguration pipelet. All other import/export-specific pipelets depend on it. Calling an import/export pipelet without an existing
Controller causes a severe error.
Configuration values are global or local (pipelet-specific) and can be stored in multiple ways:
Controller retrieves the pipelet configuration value for the given key.
Controller retrieves the property from a configuration file when the
DetermineConfiguration pipelet is executed. Global means that the looked-up key does not have pipelet-specific extensions.
Controller retrieves the property from a configuration file when the
DetermineConfiguration pipelet is executed. Local means that the looked-up key is extended with the pipelet identifier; for example, if the key is
LogFacility and the pipelet name is
Import.LogFacilitykey is looked up.
Controller retrieves the property from the pipeline dictionary. Global means that the looked-up key does not have pipelet-specific extensions.
Controller retrieves the property from the pipeline dictionary. Local means that the dictionary key is extended with a pipelet identifier, e.g.,
Controller encounters a configuration value more than once, it uses the order of precedence displayed above for the decision which one to use; later-found values supersede earlier ones. You should always use
Controller methods to access configuration values. Doing so allows you to change the configuration at runtime. Configuration values can be a pipelet configuration property value, a
Controller property value or a pipeline dictionary value. To distinguish between different pipelets of the same kind, each pipelet must be configured with a unique pipelet identifier within the pipeline descriptor.
The specific processes executed during an import are determined by the selected import mode. If no mode or an invalid mode is set, the mode
OMIT is used by default. The following import modes can be set for importing data:
The import mode has a significant impact on overall import performance. When deciding on the import mode, take the following considerations into account:
There are two ways to set the import mode:
import-mode and modes must be specified in uppercase, for example:
import-mode = "IGNORE". Check the respective schema definition for details.
Logger objects that you create using the
CreateFileLogger pipelets (you can create as many
Logger objects as you need). Once created, a logger is registered at the
Controller object with a certain name that you can use to obtain the logger from the controller. A logger also has a log level assigned, e.g., debug, error or warning levels. Log levels can be combined using an
OR operator. The base class for loggers and a ready-to-use NativeLogger (logs to stdout) and a FileLogger (logs to a file) are defined in the core framework.
Gathering statistics for progress notification to keep track of the current position within the import process, statistics objects of type
XMLStatistics are used. The statistic is accessible through the
Controller and for the "outer world" through the import interactor. In general, the statistic is created by parsing the XML source and storing the number of elements. By incrementing a counter for a certain element, it is possible to get the current count of processed elements. The statistics object is created in the
CollectXMLStatistics pipelet. Usually, the parse content handler is in charge of incrementing the counter.
To prevent multiple imports of related objects (e.g., products) into a single unit, an import process can be locked by using the
LockImport pipelet. The
LockImport pipelet uses the locking framework, see Concept - Locking Framework.
Several import resources exist, which can be used to lock a certain import. The main resource for import is named
Import. All other import resources use this resource as parent resource.
The resources are locked in a unit-specific way by configuring the
LockImport pipelet accordingly. The following resources are available for import:
Some parts of import pipelines change the database content (e.g., the
Import pipelet). Those parts must be locked to prevent concurrent database changes on equal tables (e.g., data replication of products vs. product import or product imports in different domains). The
LockImport pipelet locks matching database resources for those tasks (meaning children of the
Database resource). As a sample, the product import locks the
Products resource before running the
Import pipelet for product tables. The
Products resource is also used by replication processes, so no replication and import process can run concurrently.
This pipelet locks the given resources in order to avoid concurrent operations on the same resources. The resources are specified by
ResourceList containing a semicolon separated list of resource names that have to be available in the table
RESOURCEPO. If parameter
IsDomainSpecific is set to
true, resources are locked only in the current domain. Due to this it is possible to start the same pipeline in different domains concurrently. If no resources are specified, the pipelet acquires the
Database resource (system wide). So, no other import, staging or process requiring the
Database resource or its sub resources will be concurrently executed. If one or more required resources could not be acquired, the pipelet returns with an error. The import process, holding the acquisition, is read from the pipeline dictionary. If no process is found, a new process is created. The acquisition made is stored in the pipeline dictionary.
AcquireFileResource is responsible for locking a file virtually to avoid conflicts with other processes working on the same ImpEx file.
Resources to acquire must be passed in either as a list of resources from the dictionary or a list of resource names from the configuration or dictionary. The pipelet stores the acquisition result and the acquisition itself in the pipeline dictionary.
LockImport pipelet must have a corresponding
UnlockImport pipelet to release the locked resources.
There are three different implementations to bulk data into the database:
The standard import processes use the
ORMImportMgr to set up the import environment including queues, threads and so on. The goal is to write a pipeline calling particular import pipelets with corresponding configurations. Each import process has to configure the business object-specific XML parser (
XMLParseContentHandler), validator (
ElementValidator) as well as bulker (
ElementBulker). Normally, this is done in the according pipeline. The resulting object diagram looks like this:
For every business object that can be imported, a processing pipeline exists. The name of the pipeline follows this pattern
Process<BusinessObject>Import, for example
ProcessProductImport. Each pipeline has the same basic design:
Process<BusinessObject>Import-Validate parses the import file.
Process<BusinessObject>Import-Prepare takes care of the initial data conversion, parsing, and XML validation processes.
Process<BusinessObject>Import-Import then executes the actual import process.
The following properties can be added to the import property file to tune the process:
Global property key
Import property key
Number of import elements that should be batched together to the database.
Number of import batches (see above) that should be committed in the database.
The number of validator threads.
The number of bulker threads.
The size of JVM. Increasing the size of JVM may improve cache efficiency.
Tuning of the garbage collector.
Defines a percentage (%); if the ratio of imported products/offers compared to the current products/offers in the import domain exceeds this threshold, the following actions will be performed:
If the import does not exceed the threshold, then only incremental search index updates and cache clearing is done.
If the import process needs to bulk mass data into the database, the following aspects need to be considered:
ElementBulkerORM automatically triggers analyzing database tables during the import process.
All standard export processes use the ImpEx framework to provide basic functionality including ISML templating, logging as well as monitoring. So, each export process consists at least of a pipeline with the following start nodes:
Due to serialization into the file system, the multi-threading approach does not make sense, because the shared file system is often the bottleneck.