The Intershop 7 import/export framework (short term: impex framework) consists of specific Java classes, pipelets, and pipelines for importing and exporting data. The framework is used by the standard impex wizards that guide Intershop 7 users through standard import or export operations. Developers can use the framework to extend the functionality or to customize existing import and export operations. The impex framework is pipeline-oriented. Developers can use the existing pipelines to do the following:
The import functionality is based on the Simple API for XML (SAX) parser combined with JAXB for parsing XML files. For standard database imports, Intershop 7 uses the ORM (object relational mapping) layer to bulk mass data into the database. The export functionality uses the template language ISML to serialize persistent objects.
The following functionality with corresponding API is provided:
Term | Description |
---|---|
Impex | Short term for imp(ort) and ex(port). |
Controller | Central object providing general functionality like logging, statistics, configuration, ... |
A typical import process involves the following steps:
There are three thread groups. The first parses the file, the second validates and complements the parsed objects, and the third thread group is responsible for writing the data into the database. Parsing, validating, and bulking run in parallel. Validating and bulking in many cases can be parallelized using multiple threads to ensure high performance during import.
Some import processes, like for products and catalogs, require duplicate execution of these three steps, because:
In the first phase, the raw data without relations are imported. Afterwards, when all objects exist in the database, the import is executed again to import the relations between imported objects.
Every export process executes the following steps:
All import/export data files, such as sources, export files, configuration files, templates, etc., are stored in the IS_HOME/share/sites/<site>/unit/<unit>/impex
directory. When uploading an import file, Intershop 7 transfers it from the source location to the appropriate location in the impex directory. Similarly, when exporting data, you can use the back office wizard to download export target files from the respective impex directory location.
Subdirectory | Description | Example |
---|---|---|
| To store previous import/export files. | n/a |
| Configuration files for import/export processes. |
|
| Export target files. |
|
| Obsolete directory for Oracle SQL*Loader. | n/a |
| Logs created by parser, validator and bulker are stored in this directory. |
|
| Import data source files. |
|
| Temporary files. | n/a |
The schema files (XSD) for corresponding imports are located in IS_HOME/share/system/impex/schema
.
The complete configuration and controlling of import and export pipelines is handled by the Controller
object. The Controller is created within the DetermineConfiguration
pipelet, which determines the pipelet configuration and stores the Controller
object in the pipeline dictionary. The Controller
is the only pipeline dictionary object that is passed between pipelets to access their configuration-specific and import/export-specific services (for example, logging, progress notification, interaction, and event polling). The first action within an import/export pipeline must be the creation of the Controller
object by the DetermineConfiguration
pipelet. All other import/export-specific pipelets depend on it. Calling an import/export pipelet without an existing Controller
causes a severe error. Configuration values are global or local (pipelet-specific) and can be stored in multiple ways:
Controller
retrieves the pipelet configuration value for the given key.
Controller
property Controller
retrieves the property from a configuration file when the DetermineConfiguration
pipelet is executed. Global means that the looked-up key does not have pipelet-specific extensions.
Controller
property Controller
retrieves the property from a configuration file when the DetermineConfiguration
pipelet is executed. Local means that the looked-up key is extended with the pipelet identifier; for example, if the key is LogFacility
and the pipelet name is Import
, the Import.LogFacilitykey
is looked up.Controller
retrieves the property from the pipeline dictionary. Global means that the looked-up key does not have pipelet-specific extensions.Controller
retrieves the property from the pipeline dictionary. Local means that the dictionary key is extended with a pipelet identifier, e.g., Import.DefaultImportMode.
When the Controller
encounters a configuration value more than once, it uses the above order of precedence for the decision which one to use; later-found values supersede earlier ones. You should always use Controller
methods to access configuration values. Doing so allows you to change the configuration at runtime. Configuration values can be a pipelet configuration property value, a Controller
property value or a pipeline dictionary value. To distinguish between different pipelets of the same kind, each pipelet must be configured with a unique pipelet identifier within the pipeline descriptor.
The specific processes executed during an import are determined by the selected import mode. If no mode or an invalid mode is set, the mode OMIT
is used by default. The following import modes can be set for importing data:
IGNORE
INITIAL
UPDATE
REPLACE
OMIT
DELETE
The import mode has a significant impact on overall import performance. When deciding on the import mode, take the following considerations into account:
There are two ways to set the import mode:
import-mode
and modes must be specified in uppercase, for example: import-mode = "IGNORE"
. Check the respective schema definition for details.Import/export uses Logger
objects that you create using the CreateLogger
or CreateFileLogger
pipelets (you can create as many Logger
objects as you need). Once created, a logger is registered at the Controller
object with a certain name that you can use to obtain the logger from the controller. A logger also has a log level assigned, e.g. debug, error or warning levels. Log levels can be combined using an OR
operator. The base class for loggers and a ready-to-use NativeLogger (logs to stdout) and a FileLogger (logs to a file) are defined in the core framework.
Gathering statistics for progress notification to keep track of the current position within the import process, statistics objects of type XMLStatistics
are used. The statistic is accessible through the Controller
and for the "outer world" through the import interactor. In general, the statistic is created by parsing the XML source and storing the number of elements. By incrementing a counter for a certain element, it is possible to get the current count of processed elements. The statistics object is created in the CollectXMLStatistics
pipelet. Usually, the parse content handler is in charge of incrementing the counter.
To prevent multiple imports of related objects (e.g. products) into a single unit, an import process can be locked by using the LockImport
pipelet. The LockImport
pipelet uses the locking framework. Several import resources exist, which can be used to lock a certain import. The main resource for import is named Import
. All other import resources use this resource as parent resource.
The resources are locked in a unit-specific way by configuring the LockImport
pipelet accordingly. The following resources are available for import:
UserImport
CategoryImport
ProductImport
DiscountImport
PriceImport
ProductTypeImport
VariationTypeImport
OrderImport
Some parts of import pipelines change the database content (e.g. the Import
pipelet). Those parts must be locked to prevent concurrent database changes on equal tables (e.g. data replication of products vs. product import or product imports in different domains). The LockImport
pipelet locks matching database resources for those tasks (meaning children of the Database
resource). As a sample, the product import locks the Products
resource before running the Import
pipelet for product tables. The Products
resource is also used by replication processes, so no replication and import process can run concurrently.
The pipelet AcquireFileResource
is responsible for locking a file virtually to avoid conflicts with other processes working on the same impex file.
Each LockImport
pipelet must have a corresponding UnlockImport
pipelet to release the locked resources.
There are three different implementations to bulk data into the database:
The standard import processes use the ORMImportMgr
to set up the import environment including queues, threads and so on. The goal is to write a pipeline calling particular import pipelets with corresponding configurations. Each import process has to configure the business object-specific XML parser ( XMLParseContentHandler
), validator ( ElementValidator
) as well as bulker ( ElementBulker
). Normally, this is done in the according pipeline. The resulting object diagram looks like this:
For every business object that can be imported, a processing pipeline exists. The name of the pipeline follows this pattern Process<BusinessObject>Import
, for example ProcessProductImport
. Each pipeline has the same basic design:
Process<BusinessObject>Import-Validate
parses the import file.Process<BusinessObject>Import-Prepare
takes care of the initial data conversion, parsing, and XML validation processes.Process<BusinessObject>Import-Import
then executes the actual import process.The following properties can be added to the import property file to tune the process:
Global property key | Import property key | Description | Default value | Location |
---|---|---|---|---|
|
| Number of import elements that should be batched together to the database. | 100 |
|
|
| Number of import batches (see above) that should be committed in the database. | 100 |
|
|
| The number of validator threads. | 1 |
|
|
| The number of bulker threads. | 4 |
|
|
| The size of JVM. Increasing the size of JVM may improve cache efficiency. |
|
|
|
| Tuning of the garbage collector. |
|
|
If the import process needs to bulk mass data into the database, the following aspects need to be considered:
ElementBulkerORM
automatically triggers analyzing database tables during the import process.All standard export processes use the impex framework to provide basic functionality including ISML templating, logging as well as monitoring. So, each export process consists at least of a pipeline with the following start nodes:
Prepare
:DetermineConfiguration
CreateFileLogger
GetPageable
RunExport
:OpenFile
OpenFilter
Export
CloseFilter
CloseFile
CleanUp
:CloseLoggers
Due to serialization into the file system, the multithreading approach does not make sense, because the shared file system is often the bottleneck.
The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.