The Intershop Commerce Management (ICM) import/export framework (short term: ImpEx framework) consists of specific Java classes, pipelets, and pipelines for importing and exporting data. The framework is used by the standard ImpEx wizards that guide ICM users through standard import or export operations. Developers can use the framework to extend the functionality or to customize existing import and export operations. The ImpEx framework is pipeline-oriented. Developers can use the existing pipelines to do the following:
The import functionality is based on the Simple API for XML (SAX) parser combined with JAXB for parsing XML files. For standard database imports, ICM uses the ORM (object relational mapping) layer to bulk mass data into the database. The export functionality uses the template language ISML to serialize persistent objects.
The following functionality with corresponding API is provided:
Term | Description |
---|---|
ImpEx | Short term for import and export. |
Controller | Central object providing general functionality like logging, statistics, configuration, ... |
A typical import process involves the following steps:
There are three thread groups. The first parses the file, the second validates and complements the parsed objects, and the third thread group is responsible for writing the data into the database. Parsing, validating, and bulking run in parallel. Validating and bulking in many cases can be parallelized using multiple threads to ensure high performance during import.
Some import processes, like for products and catalogs, require duplicate execution of these three steps, because:
In the first phase, the raw data without relations are imported. Afterwards, when all objects exist in the database, the import is executed again to import the relations between imported objects.
Every export process executes the following steps:
All import/export data files, such as sources, export files, configuration files, templates, etc., are stored in the IS_HOME/share/sites/<site>/unit/<unit>/impex directory. When uploading an import file, ICM transfers it from the source location to the appropriate location in the ImpEx directory. Similarly, when exporting data, you can use the back office wizard to download export target files from the respective ImpEx directory location.
Subdirectory | Description | Example |
---|---|---|
| To store previous import/export files. | n/a |
| Configuration files for import/export processes. |
|
| Export target files. | ProductExport.xml |
| Obsolete directory for Oracle SQL*Loader. | n/a |
| Logs created by parser, validator, and bulker are stored in this directory. | ProductImport.log |
| Import data source files. | Products.xml |
| Temporary files. | n/a |
Schema files (XSD) location:
The complete configuration and controlling of import and export pipelines is handled by the Controller
object. The Controller
is created within the DetermineConfiguration
pipelet, which determines the pipelet configuration and stores the Controller
object in the pipeline dictionary. The Controller
is the only pipeline dictionary object that is passed between pipelets to access their configuration-specific and import/export-specific services (for example, logging, progress notification, interaction, and event polling). The first action within an import/export pipeline must be the creation of the Controller
object by the DetermineConfiguration
pipelet. All other import/export-specific pipelets depend on it. Calling an import/export pipelet without an existing Controller
causes a severe error.
Configuration values are global or local (pipelet-specific) and can be stored in multiple ways:
Controller
retrieves the pipelet configuration value for the given key.
Controller
property Controller
retrieves the property from a configuration file when the DetermineConfiguration
pipelet is executed. Global means that the looked-up key does not have pipelet-specific extensions.
Controller
property Controller
retrieves the property from a configuration file when the DetermineConfiguration
pipelet is executed. Local means that the looked-up key is extended with the pipelet identifier; for example, if the key is LogFacility
and the pipelet name is Import
, the Import.LogFacilitykey
is looked up.Controller
retrieves the property from the pipeline dictionary. Global means that the looked-up key does not have pipelet-specific extensions.Controller
retrieves the property from the pipeline dictionary. Local means that the dictionary key is extended with a pipelet identifier, e.g., Import.DefaultImportMode.
When the Controller
encounters a configuration value more than once, it uses the order of precedence displayed above for the decision which one to use; later-found values supersede earlier ones. You should always use Controller
methods to access configuration values. Doing so allows you to change the configuration at runtime. Configuration values can be a pipelet configuration property value, a Controller
property value or a pipeline dictionary value. To distinguish between different pipelets of the same kind, each pipelet must be configured with a unique pipelet identifier within the pipeline descriptor.
The specific processes executed during an import are determined by the selected import mode. If no mode or an invalid mode is set, the mode OMIT
is used by default. The following import modes can be set for importing data:
IGNORE
INITIAL
UPDATE
REPLACE
OMIT
DELETE
The import mode has a significant impact on overall import performance. When deciding on the import mode, take the following considerations into account:
There are two ways to set the import mode:
import-mode
and modes must be specified in uppercase, for example: import-mode = "IGNORE"
. Check the respective schema definition for details.Import/export uses Logger
objects that you create using the CreateLogger
or CreateFileLogger
pipelets (you can create as many Logger
objects as you need). Once created, a logger is registered at the Controller
object with a certain name that you can use to obtain the logger from the controller. A logger also has a log level assigned, e.g., debug, error or warning levels. Log levels can be combined using an OR
operator. The base class for loggers and a ready-to-use NativeLogger (logs to stdout) and a FileLogger (logs to a file) are defined in the core framework.
Gathering statistics for progress notification to keep track of the current position within the import process, statistics objects of type XMLStatistics
are used. The statistic is accessible through the Controller
and for the "outer world" through the import interactor. In general, the statistic is created by parsing the XML source and storing the number of elements. By incrementing a counter for a certain element, it is possible to get the current count of processed elements. The statistics object is created in the CollectXMLStatistics
pipelet. Usually, the parse content handler is in charge of incrementing the counter.
To prevent multiple imports of related objects (e.g., products) into a single unit, an import process can be locked by using the LockImport
pipelet. The LockImport
pipelet uses the locking framework, see Concept - Locking Framework.
Several import resources exist, which can be used to lock a certain import. The main resource for import is named Import
. All other import resources use this resource as parent resource.
The resources are locked in a unit-specific way by configuring the LockImport
pipelet accordingly. The following resources are available for import:
UserImport
CategoryImport
ProductImport
DiscountImport
PriceImport
ProductTypeImport
VariationTypeImport
OrderImport
Some parts of import pipelines change the database content (e.g., the Import
pipelet). Those parts must be locked to prevent concurrent database changes on equal tables (e.g., data replication of products vs. product import or product imports in different domains). The LockImport
pipelet locks matching database resources for those tasks (meaning children of the Database
resource). As a sample, the product import locks the Products
resource before running the Import
pipelet for product tables. The Products
resource is also used by replication processes, so no replication and import process can run concurrently.
This pipelet locks the given resources in order to avoid concurrent operations on the same resources. The resources are specified by ResourceList
containing a semicolon separated list of resource names that have to be available in the table RESOURCEPO
. If parameter IsDomainSpecific
is set to true
, resources are locked only in the current domain. Due to this it is possible to start the same pipeline in different domains concurrently. If no resources are specified, the pipelet acquires the Database
resource (system wide). So, no other import, staging or process requiring the Database
resource or its sub resources will be concurrently executed. If one or more required resources could not be acquired, the pipelet returns with an error. The import process, holding the acquisition, is read from the pipeline dictionary. If no process is found, a new process is created. The acquisition made is stored in the pipeline dictionary.
The pipelet AcquireFileResource
is responsible for locking a file virtually to avoid conflicts with other processes working on the same ImpEx file.
Resources to acquire must be passed in either as a list of resources from the dictionary or a list of resource names from the configuration or dictionary. The pipelet stores the acquisition result and the acquisition itself in the pipeline dictionary.
Each LockImport
pipelet must have a corresponding UnlockImport
pipelet to release the locked resources.
There are three different implementations to bulk data into the database:
The standard import processes use the ORMImportMgr
to set up the import environment including queues, threads and so on. The goal is to write a pipeline calling particular import pipelets with corresponding configurations. Each import process has to configure the business object-specific XML parser ( XMLParseContentHandler
), validator ( ElementValidator
) as well as bulker ( ElementBulker
). Normally, this is done in the according pipeline. The resulting object diagram looks like this:
For every business object that can be imported, a processing pipeline exists. The name of the pipeline follows this pattern Process<BusinessObject>Import
, for example ProcessProductImport
. Each pipeline has the same basic design:
Process<BusinessObject>Import-Validate
parses the import file.Process<BusinessObject>Import-Prepare
takes care of the initial data conversion, parsing, and XML validation processes.Process<BusinessObject>Import-Import
then executes the actual import process.The following properties can be added to the import property file to tune the process:
Global property key | Import property key | Description | Default value | Location |
---|---|---|---|---|
|
| Number of import elements that should be batched together to the database. | 100 | |
|
| Number of import batches (see above) that should be committed in the database. | 100 | |
| The number of validator threads. | 1 | ||
| The number of bulker threads. | 4 | ||
| The size of JVM. Increasing the size of JVM may improve cache efficiency. | IS_HOME/bin/tomcat.sh | ||
| Tuning of the garbage collector. | IS_HOME/bin/tomcat.sh | ||
| Defines a percentage (%); if the ratio of imported products/offers compared to the current products/offers in the import domain exceeds this threshold, the following actions will be performed:
If the import does not exceed the threshold, then only incremental search index updates and cache clearing is done. | 10 | IS_SHARE/system/config/cluster/appserver.properties |
If the import process needs to bulk mass data into the database, the following aspects need to be considered:
ElementBulkerORM
automatically triggers analyzing database tables during the import process.All standard export processes use the ImpEx framework to provide basic functionality including ISML templating, logging as well as monitoring. So, each export process consists at least of a pipeline with the following start nodes:
Prepare
:DetermineConfiguration
CreateFileLogger
GetPageable
RunExport
:OpenFile
OpenFilter
Export
CloseFilter
CloseFile
CleanUp
:CloseLoggers
Due to serialization into the file system, the multi-threading approach does not make sense, because the shared file system is often the bottleneck.