Document Properties
Kbid23377Q
Last Modified24-Sep-2020
Added to KB17-Jul-2012
Public AccessEveryone
StatusOnline
Doc TypeGuidelines, Concepts & Cookbooks
Product
  • ICM 7.6
  • ICM 7.7
  • ICM 7.8
  • ICM 7.9
  • ICM 7.10

Concept - Mass Data Replication


Table of Contents

Product Version

7.0

Product To Version

 

Status

final

1 Introduction

INTERSHOP 7's Data Replication in general refers to the process of first updating data in a source system and then synchronizing the data with a target system. The replication mechanism makes it possible to develop and maintain content in the background (i.e., in a source system being offline for the public) without disturbances to the target system being online.

INTERSHOP 7 provides two fundamental ways to update Live system data in a data replication environment: Mass Data Replication, which is intended to be used for high volumes of data, and Business Object Replication, which is to be used for fast updates of some selective data. Both methods use the same communication channels, but differ in the way they collect data in source and inject them in target system.

This Concept deals only with Mass Data Replication.

1.1 Glossary

Staging

Refers to a framework providing basic functionality to transfer data in terms of database or filesystem data from a source system to a target cluster.
Often used as a synonym for Data Replication (which is actually not correct)

Data Replication

Data replication is a process to transfer large amounts of data from a source cluster to a target cluster. As a typical scenario, one could first update any storefront data (like product data) and other settings in an editing system and then transfer this information to a live system. This mechanism allows to develop and maintain large content in the background without significant disturbances to the production system. The mechanism for transferring individual business objects in an ad-hoc manner is called object replication (developer and administrator perspective) or publishing (shop manager perspective).

Editing system

In a data replication environment, the editing system is a dedicated Intershop 7 installation used to prepare or update the storefront data in the background without disturbing the operation of the live system. The wording pronounces the purpose aspect of the system in the Data Replication environment as seen by a Data Replication Manager.

Source system

Describes an INTERSHOP 7 system used to import and test new data which then are intended to be transferred to another INTERSHOP 7 system by usage of Data Replication. Thus, it is often used as a synonym for editing system in a Data Replication environment.
The wording pronounces the data flow aspect in a Data Replication environment.

Offline system

Often used as a synonym for a source system in a Data Replication environment.
The wording pronounces the aspect of public accessibility of the system in the Data Replication environment.

Live system

In a data replication environment, the live system is a dedicated Intershop 7 installation that serves the live storefront and receives the data prepared in the editing system. The wording pronounces the purpose aspect of the system in the Data Replication environment as seen by a Data Replication Manager.

Target system

Describes an INTERSHOP 7 system, which is intended to be the receiver of data transferred from another INTERSHOP 7 system (the source system) by usage of Data Replication. Thus, it is often used as a synonym for live system in a Data Replication environment.
The wording pronounces the data flow aspect in a Data Replication environment.

Online system

Often used as a synonym for a target system in a Data Replication environment.
The wording pronounces the aspect of public accessibility of the system in the Data Replication environment.

Target system vs. Target cluster

A target system refers to an INTERSHOP 7 cluster, which is the receiver (the target) of a data replication process. As seen from a Data Replication perspective, a target system owns one web server address and one database schema, though it may consist of multiple web and app server(s).
A target cluster refers to a logical compound of multiple (even spatially divided) target systems sharing the same cluster ID, but each of them owning it's separate shared file system, own web address and own database schema. All target systems of a target cluster are updated in parallel with the same replication process.

1.2 References

2 (Mass Data) Replication and Staging

INTERSHOP 7's data replication mechanism is based on three different frameworks: staging, JDBC, and locking framework, see figure below.

While the complete replication mechanism provides an all-encompassing business process centered handling of data synchronization, staging provides the fundamental data transport mechanism and thus a viewing from a technical prospective.

Figure: Mass Data Replication: Involved Frameworks

2.1 Staging Framework

The staging framework provides the fundamental main entities and processes to identify and access the content affected by data replication, to model the assignment of content to replication processes, and to initiate and manage process execution.

The data replication mechanism does not replace the staging framework. It extends the staging framework in order to facilitate the management and execution of staging processes.

Note

The term Staging often has been used as a synonym for Replication, it is in fact only one INTERSHOP 7 component involved in a Mass Data Replication process.

2.2 JDBC Framework

The JDBC framework and SQL are used to initiate data transfer between database instances or schemata.

2.3 Locking Framework

The locking framework prevents different processes within an INTERSHOP 7 cluster (such as import processes, jobs, or data replication processes) from accessing the same resources at the same time, e.g., database tables or file system content. Each process therefore has to impose a virtual lock on any resource it is going to access, in order to ensure no other process can concurrently modify the resource.

3 Basic Architecture and Infrastructure for Data Replication

Basically, the data replication mechanism of INTERSHOP 7 relates two kind of systems: source systems and a target systems.

To provide a Multi Data Center support, target systems, though they can possibly be situated in different locations, are encapsulated in (logic) target clusters. Same applies to source system / (logic) source clusters.

Note

All target systems of one target cluster are allowed to be active at the same time, while at one time only one source system is allowed to be active (up and running).

For an easier understanding, the following figure shows a simplified view with only one editing and one target system; Multi Data Center functionality is described in a later separate paragraph in more detail.

Figure: Simplified Basic Architecture

One target system includes one or more application servers, the Web server, the Web adapter and a target database account. In fact, the number of application servers, Web servers and Web adapters is irrelevant to the data replication mechanism, it must just meet the required needs in order to process incoming requests properly.

One source system also includes one or more application servers, Web server, Web adapters and a source database account. Again, the number of application servers, Web server and Web adapters is irrelevant to the data replication mechanism. Typically, the sizing requirements for a source system are lower, as the source system does not have to process online requests.

All target systems of a target cluster have to use the identical clusterID, i.e., the content of the file share/system/config/cluster/cluster.id needs to be identical. All editing systems of of the according editing cluster have to use an identical clusterID, but different from the target cluster.

A source cluster can be connected to multiple target clusters. However, each data replication process is directed at exactly one replication cluster. It is not possible to update multiple target clusters from a source system in one data replication process. Nevertheless, all target systems belonging to the target cluster selected for a replication process are updated with the same replication process.

Mass data replication is based on the following fundamental paradigms:

  • One Replication process involves exactly one source system and one target cluster. A target cluster consists of at least one target system. Data replication is handled by one subordinated staging process per target system.
  • A Replication process is defined to be atomic, meaning that it is only finished successfully if all subordinated staging processes finished successfully. If only one subordinated staging process, i.e., replication to one of the target systems forming the current target cluster fails, the whole replication process is considered as failed.

Basic mechanism:

  • A Replication process is started off by the editing system. The editing system calls via HTTP a Web service in each assigned target systems to inform the target system, that and which new data is available.
  • Each target system then pulls the advised data from the editing system, whereby file system data is downloaded via HTTP and database data is downloaded via a database by the target system.

4 Mass Data Replication from a Business Point of View

4.1 Data Replication Workflow

From a user's perspective, data replication is separated into two main stages: first defining data replication tasks, and afterwards executing these tasks as data replication processes. Both stages are managed in the editing system and are described in more detail below.

4.1.1 Role Concept

According to the two stages two basic user roles for data replication can be distinguished: Data Replication Manager and System Administrator.

Data replication managers operate within the back office of a particular business unit (i.e., enterprise or channel). They don't need any technical knowledge of data replication. They create replication tasks and assign them to the system administrator for execution. For example, the data replication manager could be an editor who maintains product and catalog information of a consumer channel of the source system. The editor then creates the task to replicate the data to the consumer channel of the target system.

The system administrator acts as data replication manager of the system unit (central e-selling administration, i.e., Operations back office). He overlooks data replication across the whole system through technical eyes. His duties encompass receiving of the replication tasks from the data replication managers of the individual business units, combining them to data replication processes for execution, assigning the appropriate target cluster, and starting of the replication processes.
Additionally, the system administrator can trigger the rollback of publication processes if necessary, and monitors the replication process progress.

Each business unit (channel, enterprise/sales partner) contains an access privilege Data Replication Manager, which is connected with the permission SLD_MANAGE_DATA_REPLICATION. The Data Replication Tasks module of INTERSHOP 7's back office becomes accessible if the user inherits the access privilege Data Replication Manager for the particular business unit.
The system administrator owns the same permission, but in comparison to the context of a business unit the functionality of module Data Replication Tasks is limited to process published tasks in utilization of additional module Data Replication Processes.

4.1.2 Replication Tasks

Data replication tasks determine the content to be replicated. They are defined by the responsible data replication managers individually for each channel in the sales organization or partner back office. For example, the data replication manager of the channel “PrimeTechSpecials” can define data replication tasks for this particular channel, using the consumer channel management plug-in in the sales organization back office.

For each data replication task the data replication manager has to define:

  • Start Date
    The start date sets the earliest time at which a replication task should be executed.
  • Replication Groups
    To each replication task, one or more data replication groups have to be assigned. Replication groups define the kind of data to be replicated from the view of business objects.

Once defined, data replication tasks are submitted to the system administrator for execution.

4.1.2.1 Replication Groups

A Data replication group identifies the content to be replicated from a business object's point of view. Thus, the replication group can encapsulate various content types (file content, database content), which is needed to replicate the selected business object. For example, the data replication group “Organization” includes the organization profile, the departments, the users and roles, and all preferences defined for an organization.
Each replication group refers to a certain content domain.

4.1.3 Replication Process

To execute data replication tasks, the system administrator defines data replication processes in the central administration front end.

For each data replication process, the system administrator defines:

  • Target Cluster
    A source system can be connected to multiple target clusters (each consisting of one or more target systems). However, each replication process can address a single target cluster only.
  • Replication Tasks
    Each replication process executes one or more replication tasks as submitted by the responsible data replication managers. Only replication tasks whose start date has been reached can be included in a replication process.
  • Activation Rules
    Data replication processes can be started either manually, or by a scheduled job at predefined times.
  • Data Replication Type
    From a business point of view, the data replication type determines, how data is handled during replication. Possible replication types include Data Transfer, Data Publishing, Data Transfer and Publishing, and Undo.

4.1.3.1 Replication Types

For each replication process, a data replication type is set by the system administrator. From a business point of view, the data replication type determines, if new data is transferred and published in one single process or in separate processes. Subsequently to a replication process, which included a sucessful publication, additionally a one-step-back undo process can be run.
The following replication types are available:

  • Data Transfer
    This process transfers the data to the target cluster. However, it does not trigger a table or directory switch (publication).
  • Data Publishing
    This process publishes data that have already been transferred to the target cluster. The process triggers all necessary table and directory switches as well as concomitant database commits to persist the changes (publication and cache refresh).

    Note

    Data publishing can only be executed on the results of a process of type Data Transfer executed immediately before.

  • Data Transfer and Publishing
    This process accomplishes a complete replication process.
  • Undo
    An Undo process rolls back a data replication process of type Data Publishing or Data Transfer & Publishing which has been completed successfully. Undo restores the target cluster state prior to executing the data replication task that is rolled back.

    Note

    Undo does not support undoing processes of type Data Transfer. Also, Undo can only roll back the most recent data replication process.

5 Mass Data Replication from a Technical Prospective

5.1 Replication Process Phases

A complete data replication process consists of the following main phases, as described in the figure below:

ConceptMassdataReplicationPhases

Figure: Phases of a Replication process

  1. Preparation
    During this phase, the content involved in the current replication process is prepared. For example, the database tables will be analyzed to guarantee optimal execution plans for SQL statements used during the replication process.
    Moreover, index files are created and packed into the distribution directory <IS.INSTANCE.SHARE>/dist/staging, containing information on files to be replicated.
  2. Synchronization
    The replication process merges content to be replicated (new content of source system), with content that should not be changed (old content of target system belonging to other domains).
    During the synchronization phase, the old content of the target system that should not be changed is replicated to physical shadow containers (tables or directories).
  3. Replication
    During the replication phase, the new content is copied to the shadow container of target system. Database content and file system content is handled separately.
  4. Publication
    The final step of the data replication process is to publish the replicated content, for example by performing a switch between live and shadow tables (full replication of database content) or between active and inactive directories (replication of file system content). As a result, any new or changed data is available for online users, and deleted data does no longer appear in the Web front.

    Note

    The publication phase is not run throught if any of the preceding steps has ended with an error.

  5. Cache Refresh
    There are several caches in INTERSHOP 7 to ensure high performance. These caches are refreshed whenever new content has been published.

The process details for the individual phases differ depending on the content type to be replicated and the staging processor used to execute the replication process.

5.1.1 Replication Types

When preparing a replication process, the system administrator has to set a data replication type. From a technical point of view, the data replication type determines which replication phases are actually performed for the respective data replication tasks. The following replication types are available:

  • Data Transfer
    This process transfers the data to the target cluster, involving the phases preparation, synchronization, and replication. However, it does not trigger a table or directory switch (phase publication).
  • Data Publishing
    This process publishes data that have already been transferred to the target cluster. The process triggers all necessary table and directory switches as well as concomitant database commits to persist the changes (i.e., phases publication and cache refresh).

    Note

    Data publishing can only be executed on the results of a process of type Data Transfer executed immediately before.

  • Data Transfer and Publishing
    This process accomplishes a complete replication process.
  • Undo
    An Undo process rolls back a data replication process of type Data Publishing or Data Transfer & Publishing which has been completed successfully. Undo restores the target cluster state prior to executing the data replication task that is rolled back.

    Note

    Undo does not support undoing processes of type Data Transfer. Also, Undo can only roll back the most recent data replication process.

5.2 Mass Data Replication Process Model

5.2.1 Replication and Staging Processes

In the active source system, for each data replication process a ReplicationProcess object, a StagingProcess object and at least one additional StagingProcess object (one for each target system of the target cluster assigned to the replication process) is created, all being tightly integrated with the locking framework, as shown in the figure below.

InternalReplicationAndStagingProcessStructure

Figure: Internal structure of a Mass Data Replication process

  • The ReplicationProcess is mainly used to organize and visualize individual data replication processes. The ReplicationProcess always starts one StagingProcess, representing the target cluster, and additionally one StagingProcess for each target system of this cluster.
  • The StagingProcess stores meta-information associated with a data replication process, such as the name of the target cluster and the data replication type set for the process by the system administrator. The StagingProcess objects are created automatically by the staging framework.
  • The StagingProcess uses additional staging sub-processes (one StagingProcess instance per target system) to keep track of the process states of each of the target systems, while it's own process state provides a summarized process state for the whole target cluster and in this way for the whole staging process.
  • Both ReplicationProcess and StagingProcess are wrapper classes which extend functionality provided by the Process class of the locking framework.

    Note

    The locking framework provides the necessary persistent objects. For example, the wrapper class ReplicationProcess contains the persistent object Process. Replication-specific information of the ReplicationProcess are mapped onto custom attributes of the Process object.

5.2.2 Replication Process Model

A ReplicationProcess consists of ReplicationTask objects and is created and started by the system administrator.

A ReplicationTask is created by the data replication managers of the respective busines unit. The data replication manager defines the content of a ReplicationTask. A ReplicationTask consists of at least one ReplicationTaskAssignment.

A ReplicationTaskAssignment references exactly one StagingGroup and one Domain, thus embodying a ReplicationGroup.

ReplicationGroups can be selected by the data replication managers in the back office of their business unit.

Figure: Mass Data Replication process model

5.2.3 Staging Process Model

A staging process consists of several components describing the content affected by this process.

Figure: Staging process model

Each StagingProcess has a type. The types which a StagingProcess can assume correspond to the data replication types which the system administrator can set for each replication process in the back office (see Replication Types described in the section before):

  • Replication (Data Transfer)
  • Publication
  • ReplicationPublication (Data Transfer and Publishing in a single process)
  • Undo

Each StagingProcess references one or more StagingProcessComponents. A StagingProcessComponent references exactly one domain and one StagingGroup.

5.2.4 Staging Resources (StagingResourceAssignment)

The staging framework uses resources definitions of the locking framework to lock affected resources (e.g., tables) whenever a data replication process is executed. Thus, the staging mechanism prevents the respective resources from being changed by other processes (e.g., jobs, imports), while a replication process is underway.

5.3 Mass Data Replication Entity Model

The entity model describes the content components to be transferred by replication processes, making use of fundamental concepts of the staging framework such as StagingGroup, StagingTable and StagingDirectory.

5.3.1 Data Replication Groups and Staging Groups

Data replication groups identify the content to be transferred between source and target system from a business point of view, e.g., catalogs, channels or product prices. Replication groups are configured via an XML configuration file, replication.xml, located in <IS.INSTANCE.SHARE>/system/config/cluster.

Replication groups can be conceived of as staging group-to-domain assignments. Hence, replication groups relate logical data containers (domains) with physical data containers (staging groups, bundling database tables or staging directories).

5.3.1.1 Assignment of Staging Groups to Replication Groups

There is no persistent object representing a data replication group. Replication groups are used at pipeline layer (see below) and at template layer (to visualize the organization of replication processes).

The staging group-to-domain assignment takes place when assigning a data replication group to a replication task. Responsible for the staging group-to-domain assignments are the pipelines ProcessReplicationGroupAssignment[channelType]. These pipelines are channel type specific. They performs the following actions:

  • The pipeline's Start start node is called, handing over the replication group ID as defined in replication.xml. The pipeline analyzes this given ID and, depending on it, selects a jump node which targets a specific sub-pipeline to handle staging group assignment for this replication group.
  • For each replication group, a specific sub-pipeline is triggered which assigns the required staging groups to the respective domains and adds this assignment to the ReplicationTask to which the replication group is added.

    Responsible for the assignment is the pipelet AddStagingGroupToReplicationTask of cartridge bc_foundation. It is called subsequently in the sub-pipeline for each staging group-to-domain combination required by the specified replication group.

    For example, the sub-pipeline ProcessReplicationGroupAssignment_52-ProductPrices handles the staging group-to-domain assignments when adding the data replication group “Product Prices” of the B2C channel to a data replication task. The pipeline contains two instances of the pipelet AddStagingGroupToReplicationTask, assigning the staging groups “Prices” and “PRICING_PriceScale”, respectively.

Note

It is necessary that all referenced domains exist at the point of assigning the replication group to the replication task for successful assignment of staging groups. For example, a catalog to be replicated has to be created before you add the replication group Catalogs to a replication task.

5.3.2 Staging Groups

A staging group consists of several staging entities of the same type and contains the configuration determining how that entities are replicated (the staging processor).

5.3.2.1 Staging Entities

A staging entity describes an atomic data container for a certain type of content: database tables, materialized database views, or file system content. Accordingly, the following types of staging entities have to be distinguished:

  • Staging Groups bundling Database Tables
    StagingTables represent tables in the database, such as PRODUCT. Staging groups of content type DATABASE contain only staging entities of type StagingTable. For details, see Staging Tables below.
  • Staging Groups bundling File System Content
    StagingDirectories represent directories in the file system including all underlying file system content like files and subdirectories. Staging groups of content type FILE SYSTEM contain only staging entities of type StagingDirectory. For details, see Staging Directories below.
  • Staging Groups bundling Materialized Views
    StagingMViews represent materialized views in the database. Staging groups of content type DATABASE MATVIEW contain only staging entities of type StagingTable.For details, see Staging Materialized Views below.

Figure: Staging Group and Staging Entities

5.3.2.1.1 StagingTable

The staging entity StagingTable represents a database table. A StagingTable can be domain specific or not.

A simple staging table (being not domain specific) has to fulfill following requirements:

  • The table does not contain columns of type LONG or LONG RAW, respectively. SQL statements used for database replication during the staging process do not support tables with these column types. Nevertheless, columns of type BLOB and CLOB are supported.
  • The table possesses a primary key. This is necessary to identify each row unambiguously.
  • The table does not reference other tables that do not belong to staging content. In order to avoid inconsistent references, tables connected via foreign keys have to be staged at one go.

Tables containing domain-specific content additionally have to fulfill the following requirement:

  • The table contains a column storing domain identifiers, or references another table containing a column storing the domain identifier. This column is used to assign staging content to domains (i.e., units or sites). A table which does not include or reference this column will automatically be assigned to the system domain.
    By default, the column name DOMAINID is assumed. Despite that, it is possible to specify a different column name by modifying the respective definitions in the StagingGroup.properties file used by DBInit resp. DBMigrate (see section on Staging Group Preparation below).

Tables that are writable in the storefront (and hence will be replicated using a delta replication mechanism) additionally have to fulfill the following requirement:

  • The table contains a column storing the modification time of the respective table row. This column has to be named LASTMODIFIED being of type DATE, and needs to be updated on each change of the according row. A mechanism is provided which sets the current date in case of changes.

    Note

    If the column does not exist, it is not possible to track changes.

When creating custom persistent objects using INTERSHOP Studio, the column is generated automatically when setting property ModificationTimeTracking for the respective class to true.

5.3.2.1.2 StagingDirectory

The staging entity StagingDirectory represents a directory containing file system content to be replicated. The staging directories reside in numbered subdirectory of each site directory. The entire content within in these directories can be replicated. The directory tree may look like pictured below:

Figure: Staging Directories in INTERSHOP 7

Note

Data replication can include unit directories in <IS.INSTANCE.SHARE>/sites/<site>/<.active>/units, where <.active> references the currently active directory (1 or 2). Note furthermore, that unit directories in <IS.INSTANCE.SHARE>/sites/<site>/units cannot be replicated, since they do not contain staging relevant content.

The .active file, located in the site directory, contains the number of the directory currently used by the application server (either 1 or 2), i.e., it defines the active directory. The other numbered directory stores the changed or new files. Upon publication, the content of the .active file is altered to point to the new active directory. The look-up mechanism of the application server reads this information and uses the specified directory.

5.3.2.1.3 StagingMView

The staging entity StagingMView, together with the MViewStagingProcessor, is used to update materialized views whose original tables were affected by replication processes. The new content of materialized views is published using database synoymys.

MViews will be refreshed in the background during replication process.

5.3.2.2 Staging Processors and Staging Decorators

Staging processors provide the core methods for the replication of different content types, such as database content or file system content.

Staging processor decorators provide additional functionality to extend the functionality of staging processors. The decorators perform tasks before or after a state has changed during a data replication process (cf. Replication Process Phases above).

Every StagingGroup is associated with a StagingProcessorConfig. The StagingProcessorConfig determines which staging processors has to be used to replicate the content represented by the staging group, i.e., it defines in which way data is replicated. Each staging group has assigned one staging processor, whereby the staging processor may or may not be extended by one or more staging decorators. As a result, a StagingProcessConfig consists of exactly one StagingProcessor and none, one or more decorators.

Figure: StagingProcessorConfig

According to the content types there exit different staging processors and decorators, implementing various methods to replicate data from editing to target systems. A detailed description of the available standard processors and decorators is given in the next section.

5.4 The Standard Staging Processors and Staging Decorators

5.4.1 Staging Processor Model

All staging processors are derived from class BasicStagingProcessor. This class provides the signature of a couple of hooks called by the pipelets of staging process pipelines. The following figure depicts the class hierarchy of the standard staging processors. All processor classes and all but two of the decorator classes are provided by the core cartridge; the RefreshSearchIndexesDecorator is implemented in bc_search, ShippingRuleEngineStagingProcessorDecorator comes with bc_shipping.

StagingProcessor_ClassHierarchy

Figure: Staging Processor Model: Class Hierarchy

For each data replication phase (Preparation, Synchronization, Replication, Publication, Refresh Caches; see Data Replication Phases above), a staging processor provides the following hook functionality:

  • The first method, onPre<Phase>Hook, is called at the beginning of the phase with all staging process components describing content affected by this assigned processor. It may be used to initialize some objects or to get system resources like database connections.
  • The second method, on<Phase>Hook, actually executes the phase (such as replication or synchronization of content).
  • When the second method has been successfully called for all assigned staging process components, the third method, onPost<Phase>Hook, is called to clean up objects or to release system resources.
  • In case of an error the fourth method ( onError<Phase>Hook) is called. It is used to release system resources and to perform some error handling.

The staging processor classes provide specific implementations of these hook methods, depending on type of content and replication mechanism.

The staging processor objects are created by a factory. The factory uses the default constructor of each processor object for initialization.

5.4.2 StagingProcessors

5.4.2.1 Staging Processors for File System Content

File system content is handled by sub-classes of FileSystemStagingProcessor providing functionality (hooks) for the publication phase of a staging process (switching directories of site content).

INTERSHOP 7 include the SimpleFileSystemStagingProcessor as default implementation class for the FileSystemStagingProcessor. This processor first creates binary index files in the source system, keeping information on the staging directories in <IS.INSTANCE.SHARE>/sites/<site>/<.active>. The index files are stored in <IS.INSTANCE.SHARE>/dist/staging.
Then the same procedure is executed in the target system. The target system afterwards downloads the binary index files from the source system and checks them for changed file system content by comparing them with it's own index files. The target system then downloads the changed files directly into to the shadow directory of the target system.

There is another implementation of the FileSystemStagingProcessor, the DRPIndexFileSystemStagingProcessor. Instead binary index files it uses a DRP indexes (XML representation of file system content) on target and source directories to detect changes of file content. Despite that, the procedure is basically equivalent to the SimpleFileSystemStagingProcessor. For performance and resource reasons (memory usage) it is recommended to use the SimpleFileStagingProcessor for new projects.

Note

The DRPIndexFileSystemStagingProcessor uses a modified DRP index mechanism. In contrast to the standard mechanism, the created index file which is used for file comparison contains rounded time stamps and size of each file instead of a check sum to reduce the time necessary to build the DRP index file.

File replication based on FileSystemStagingProcessor involves the following phases:

  1. Preparation
    DRPIndexFileSystemStagingProcessor:
    Generation (or re-use) of DRP indexes of the source and target directories. Both indexes are compared. Changed files are zipped at the source system and are then copied (along with a list of files to be deleted) to the source system directory <IS.INSTANCE.SHARE>/dist/staging.
    SimpleFileSystemStagingProcessor:
    Generation of a binary index of the source system’s file content.
  2. Synchronization
    DRPIndexFileSystemStagingProcessor only:
    Old content of the target system that should not be changed is replicated from the active directory (1) to the shadow directory (2).
  3. Replication
    DRPIndexFileSystemStagingProcessor:

The target system downloads the generated zip archives from the source system. The zip files are extracted into the shadow directory. Files to be deleted are removed in the shadow directory.

SimpleFileSystemStagingProcessor:

Generation of a binary index of the target system’s file content. The two indexes are compared. Changed files are downloaded to the target system, obsolete files are deleted.

  1. Publication
    The file *.active is changed according to the number of the shadow directory.
  2. Cache Refresh
    The change of the file *.active is propagated to the application servers of the target system. All affected caches are cleared at the target system (page cache, PO cache).

Figure: Replication of file system content

5.4.2.2 Staging Processors for Database Content

The base class for all staging processors handling database content is the abstract class DatabaseStagingProcessor. It provides methods for transaction, database connection and statement handling. Furthermore, it collects all affected persistent objects being involved in the current replication process.

Database staging processors come in two basic types: full replication processors and delta (partial) replication processors. Both mechanisms are described below.

5.4.2.2.1 Full Replication Processors

In case of full replication, database content is transferred from the source to the target system regardless of changes. Full replication is used for most types of database content, except tables that are writable in the target system (such as promotion codes on a system used as live system).

Performance tests proved, that it is faster in most cases to delete the whole data from a table and re-fill it completely including changed data than to update only the changed table entries.

Full data replication is available for global (not domain specific) and for domain specific data. Global means, that data to be replicated is not selected based on a DOMAINID. Domain specific means, that data is selected for a DOMAINID column.

The full replication mechanism relies on the following basic database objects: for each table to be replicated there are two tables with suffixes $1 and $2 added to the original table name (one used as the live, i.e., the currently active, and the other one as a shadow table) and an additional database synonym with the original table name pointing to the current live table; see also figure below. The Java functionality accesses the database table via the synonym.

Full replication involves the following steps:

  1. Preparation
    Database tables are analyzed on source and target system to collect statistical data in order to optimize the replication process.
  2. Synchronization
    The shadow tables are cleared. Subsequently, data which are not replicated (e.g., of domains which are not involved in the current replication process) are copied on the target system from the currently active tables into the shadow tables.
  3. Replication
    Data are replicated using the database link resp. direct access to the source database schema from the active tables of the source system into the shadow tables of the target system.
  4. Publication
    Old synonyms of the tables involved in the replication process are dropped. New synonyms are created that map to the former shadow tables, thus making them now active tables while former active tables now get shadow tables.
  5. Cache Refresh
    All affected caches are cleared at the target system (page cache, PO cache).

Figure: Replication of database content using the Full Replication mechanism

5.4.2.2.2 Delta (Partial) Replication Processors

In case of delta replication, only content which has actually changed is transferred from the source to the target system. The replicated content is directly inserted into the live (i.e., active) tables of the target system and published by committing the respective database transaction.
Note, that for this reason no UNDO is possible for replication processes, which include content replicated with delta replication processors.

Delta replication is used for database data which is writable in the target system. It is needed in every case, where data independently resp. concurrently is created or changed not only in the source system, but in the target system, too. An example are promotion codes, which are created in editing system and changed (redeemed) in the target system.

All delta staging processors are derived from the abstract class TransactionalStagingProcessor, which itself is derived from DataBaseStagingProcessor. The TransactionalStagingProcessor provides a method to enable the deletion triggers needed to track deletion of table rows.

Delta Replication comprises the following steps:

  1. Preparation
    Database tables are analyzed on source and target system to collect statistical data in order to optimize the replication process.
  2. Synchronization
    All data of active tables at the target system which has changed are copied to the shadow table.

In case of any error, the active table is completely copied into the shadow table. This is necessary for the data replication type “undo”.

  1. Replication
    All data which has changed since the last replication process are copied from the active tables of the source system to the active tables of the target system (using the database link resp. direct access to the source database schema).

Those data records can be detected in terms of the column LASTMODIFIED. The replication is carried out in one large transaction.

  1. Publication
    The large transaction is committed.

    Note

    Synonyms of the tables in the target system are not changed in delta replications.

  2. Cache Refresh
    All affected caches are cleared at the target system (page cache, PO cache).

Figure: Replication of database content using the Delta Replication mechanism

5.4.2.2.3 Default Database Staging Processor Classes

INTERSHOP 7 include the following database staging processor classes:

Full replication

  • FullStagingProcessor
    This staging processor is used for system content (i.e., not domain specific) such as regional settings, permissions, or roles. With this processor, ORACLE direct load SQL statements are used (TRUNCATE, INSERT /*+ APPEND */) to reduce replication time.
    With this processor, simple SQL statements are used (DELETE, INSERT).
  • FullDomainSpecificStagingProcessor
    This staging processor is used for tables containing domain-specific data. With this processor, ORACLE direct load SQL statements are used (TRUNCATE, INSERT /*+ APPEND */) to reduce replication time.
    With this processor, simple SQL statements are used (DELETE, INSERT).
  • MViewStagingProcessor
    This staging processor is used for updating materialized views in the database, whose orioginal tables were affected by data replication processes.

Delta replication

  • MergeDomainSpecificStagingProcessor
    This staging processor is used to replicate database content residing in database tables being changed in source as well as target system. Due to this, the replication occurs in one huge transaction. It uses the 'MERGE' sql statement to transfer the new and updated content, and uses the deletion tracking with deletion triggers to realize removed rows in editing system.

    Note

    The 'MERGE' statement has a restriction: It does not work on tables having a column with a context index. So, only tables with normal indexes are supported.

    Note

    Due to restriction of a huge transaction, the publication phase can not be started separately. Further, the undo process is not support in order to save the backup time of old content in live system.

  • IncrementalDomainSpecificStagingProcessor
    This staging processor is used to replicate database content residing in database tables being changed in source as well as target system. Due to this, the replication occurs in one huge transaction. It uses the 'MERGE' sql statement to transfer the new and updated content, based on the LASTMODIFIED column.

Table rows which do not exist in the editing system are deleted from the live system.

Note

The 'MERGE' statement has a restriction: It does not work on tables having a column with a context index. So, only tables with normal indexes are supported.

Note

Due to restriction of a huge transaction, the publication phase can not be started separately. Further, the undo process is not support in order to save the backup time of old content in live system.

  • AppendDomainSpecificStagingProcessor
    This staging processor is used to replicate only new content of domain specific tables. Old content of live system is never overwritten. Removed rows in editing system will never be deleted in live system.

    Note

    This processor replicates its contents in the publication phase in order to support separated Replication and Publication modes.

  • DeleteAppendDomainSpecificStagingProcessor
    This staging processor is used to replicate new and deleted content of domain specific tables. Existing old content of live system is never overwritten. Removed rows in editing system will be deleted in live system.

    Note

    This processor replicates its contents in the publication phase in order to support separated Replication and Publication modes.

5.4.2.2.4 Basic SQL commands for selected staging processors

Processor

Deleting Data
from shadow tables

Inserting Data

UnDoing Replication

FullStagingProcessor

TRUNCATE TABLE {0} REUSE STORAGE
INSERT /*+ APPEND */ INTO <shadow_table_name> dst SELECT * FROM <live_synonym_name> src


and

INSERT /*+ APPEND */ INTO <shadow_table_name> dst SELECT * FROM <source_table_in_editing_system> src


resp.

INSERT /*+ PARALLEL(dst, <nn>) */ INTO <shadow_table_name> dst SELECT /*+ PARALLEL(src, <nn>) */ * FROM <live_synonym_name> src


and

INSERT /*+ PARALLEL(dst, <nn>) */ INTO <shadow_table_name> dst SELECT /*+ PARALLEL(src, <nn>) */ * FROM <source_table_in_editing_system> src

SQL statement to save content that should not be undone: same as inserting data.

FullDomainSpecificStagingProcessor

TRUNCATE TABLE {0} REUSE STORAGE
INSERT /*+ APPEND */ INTO <shadow_table_name> dst SELECT * FROM <live_synonym_name> src WHERE <column_name_of_DOMAINID> NOT IN (select stagingdomainid from stagingprocesscomponent where stagingprocessid = <current_stagingprocess_id> and staginggroupid = <current_staginggroupid>


and

INSERT /*+ APPEND */ INTO <shadow_table_name> dst SELECT * FROM <source_table_in_editing_system> src WHERE <column_name_of_DOMAINID> IN (select stagingdomainid from stagingprocesscomponent where stagingprocessid = <current_stagingprocess_id> and staginggroupid = <current_staginggroupid>)


resp.

INSERT /*+ PARALLEL(dst, <nn>) */ INTO <shadow_table_name> dst SELECT /*+ PARALLEL(src, <nn>) */ * FROM <live_synonym_name> src WHERE <column_name_of_DOMAINID> NOT IN (select stagingdomainid from stagingprocesscomponent where stagingprocessid = <current_stagingprocess_id> and staginggroupid = <current_staginggroupid>


and

INSERT /*+ PARALLEL(dst, <nn>) */ INTO <shadow_table_name> dst SELECT /*+ PARALLEL(dst, <nn>) */ * FROM <source_table_in_editing_system> src WHERE <column_name_of_DOMAINID> IN (select stagingdomainid from stagingprocesscomponent where stagingprocessid = <current_stagingprocess_id> and staginggroupid = <current_staginggroupid>)
INSERT INTO <shadow_table_name> SELECT * FROM <live_synonym_name> WHERE <column_name_of_DOMAINID> = <domainID> 

MViewStagingProcessor

ddl.drop_materialized_view(<mview_name>);
 SELECT query FROM user_mviews WHERE mview_name=<mview_name> UNION ALL SELECT query FROM user_synonyms s JOIN user_mviews v ON (s.table_name=v.mview_name) WHERE synonym_name=<mview_name>

same as inserting data

AppendDomainSpecificStagingProcessor

none

INSERT INTO <live_table_name> SELECT * FROM <source_table_in_editing_system> src WHERE NOT EXISTS (SELECT * FROM <live_table_name> dst WHERE src.<primary_key>=dst.<primary_key> AND (<column_name_of_DOMAINID> IN (SELECT stagingdomainid FROM stagingprocesscomponent WHERE stagingprocessid=<current_stagingprocess_id> AND staginggroupid=<current_staginggroupid>)) 

transactional

DeleteAppendDomainSpecificStagingProcessor

DELETE FROM <live_table_name> dst WHERE (<primary_keys_of_table>) IN (SELECT <primary_keys_of_table> FROM <live_table_name> dst WHERE <column_name_of_DOMAINID> IN (SELECT stagingdomainid FROM stagingprocesscomponent WHERE stagingprocessid=<current_stagingprocess_id> AND staginggroupid=<current_staginggroupid>) MINUS SELECT <primary_keys_of_table> FROM <source_table_in_editing_system> src WHERE <column_name_of_DOMAINID> IN (SELECT stagingdomainid FROM stagingprocesscomponent WHERE stagingprocessid=<current_stagingprocess_id> AND staginggroupid=<current_staginggroupid>)) 

 

 

 

 

 INSERT INTO <live_table_name> SELECT * FROM <source_table_in_editing_system> src WHERE (<primary_keys_of_table>) IN (SELECT <primary_keys_of_table> FROM <source_table_in_editing_system> src WHERE <column_name_of_DOMAINID> IN (SELECT stagingdomainid FROM stagingprocesscomponent WHERE stagingprocessid=<current_stagingprocess_id> AND staginggroupid=<current_staginggroupid>) MINUS SELECT <primary_keys_of_table> FROM <live_table_name> dst WHERE <column_name_of_DOMAINID> IN (SELECT stagingdomainid FROM stagingprocesscomponent WHERE stagingprocessid=<current_stagingprocess_id> AND staginggroupid=<current_staginggroupid>))

transactional

IncrementalDomainSpecificStagingProcessor

DELETE FROM <live_table_name> WHERE ((<column_name_of_DOMAINID>=<domainid_of_current_component>) AND (<primary_keys_of_table>) NOT IN (SELECT <primary_keys_of_table> FROM <source_table_in_editing_system> WHERE <column_name_of_DOMAINID>=<domainid_of_current_component>))

 

 

 

 

MERGE INTO <live_table_name> dst USING (SELECT s.* FROM <source_table_in_editing_system> s LEFT OUTER JOIN <live_table_name> d ON (s.<comparsion_key_of_table>=d.<comparsion_key_of_table>) WHERE <column_name_of_DOMAINID>=<domainid_of_current_component> AND (d.lastmodified IS NULL OR d.lastmodified<s.lastmodified)) src ON (s.<primary_key_of_table>=d.<primary_key_of_table>) WHEN MATCHED THEN UPDATE SET dst.<assigned_column_names>=src.<assigned_column_names> WHEN NOT MATCHED THEN INSERT (<column_names>) VALUES src.<column_names>)

transactional

MergeDomainSpecificStagingProcessor

DELETE FROM <live_table_name> WHERE (<primary_keys_of_table>) IN (SELECT <primary_keys_of_table> FROM <source_deletion_table_in_editing_system> WHERE (<column_name_of_DOMAINID>=<domainid_of_current_component>)

 

 

 

 

MERGE INTO <live_table_name> dst USING (SELECT s.* FROM <source_table_in_editing_system> s LEFT OUTER JOIN <live_table_name> d ON (s.<comparsion_key_of_table>=d.<comparsion_key_of_table>) WHERE (<column_name_of_DOMAINID>=<domainid_of_current_component>) AND (d.lastmodified IS NULL OR d.lastmodified<s.lastmodified)) src ON (s.<comparsion_key_of_table>=d.<comparsion_key_of_table>) WHEN MATCHED THEN UPDATE SET dst.<assigned_column_names>=src.<assigned_column_names> WHEN NOT MATCHED THEN INSERT (<column_names>) VALUES (src.<column_names>)

transactional

 

DELETE FROM <source_deletion_table_in_editing_system> WHERE (<primary_keys_of_table>)) IN (SELECT <primary_keys_of_table> FROM <live_deletion_table>)

 

transactional

 

5.4.3 StagingDecorators

Staging process decorators add special functionality to a staging processor. All staging processor decorators are derived from the abstract class StagingProcessorDecorator, which itself is derived from BasicStagingProcessor. It is possible to use more than one staging processor decorator for a staging process.

As the staging processors themselves, the staging processor decorators are specific for the content type (file system content or database content.)

5.4.3.1 Staging Processor Decorators for File System Content

The base class for all staging processor decorators handling file system content is the abstract class StagingProcessorDecorator.

Staging processoer decorators for file system content add functionality to extend the pure transportation of files provided by the FileSystemStagingProcessor classes. This can include a reload of replicated files in the target system(s).

5.4.3.1.1 Default File system Staging Processor Decorator Classes

INTERSHOP 7 provides the following file system staging processor decorator classes:

  • RefreshLocalizationsDecorator
    The RefreshLocalizationsDecorator is provided by cartridge core and is used to refresh the localization set in target system(s) for domains included in the current staging process ( onPreRefreshCache).
  • RefreshSearchIndexesDecorator
    The RefreshSearchIndexesDecorator is provided by cartridge bc_search and is used to reload the search indexes in target system(s) for domains included in the current staging process ( onPreRefreshCache).
  • RuleRepositoryStagingProcessorDecorator
    The RefreshSearchIndexesDecorator is provided by cartridge bc_ruleengine and is used to reload the rules in target system(s) for domains included in the current staging process, after the rules transferred ( onPreRefreshCache).
  • ABTestStatisticsStagingProcessorDecorator
    The ABTestStatisticsStagingProcessorDecorator is provided by cartridge bc_marketing and is used to create new empty ABTestStatistics for new ABTestGroups after ABTests were transfered. ABTestStatistics aren't part of the replication process to have separate statistics on live and edit systems ( onPostPublicationHook).

5.4.3.2 Staging Processor Decorators for Database Content

The base class for all staging processor decorators handling database content is the abstract class DatabaseStagingProcessorDecorator, which itself is derived from StagingProcessorDecorator.

Database staging processor decorators should be used to handle table statistics, indexes, or constraints. They can also provide the possibility to execute additional database queries before or after a staging is done.

5.4.3.2.1 Default Database Staging Processor Decorator Classes

INTERSHOP 7 provides the following database staging processor decorator classes:

  • AnalyzeTablesDecorator
    The AnalyzeTablesDecorator is is provided by cartridge core and used to analyze tables of the source and target systems during the replication process. It collects statistical data about the tables (for the Oracle Cost Based Optimizer - CBO). In the source system, tables are analyzed by the onPrePreparationHook, in the target system by the onPostReplicationHook.
  • DisableConstraintsDecorator
    The DisableConstraintsDecorator is provided by cartridge core and is used to disable all constraints on shadow tables of the target system before the synchronization phase starts ( onPreSynchronizationHook). After the replication phase, the constraints will be enabled again, depending on the staging.live.enable_foreignkeys property in staging.properties( onPostReplicationHook for constraints in then-shadow tables, onPostPublicationHook for foreign keys referencing then-live tables).
  • RebuildIndexesStagingProcessorDecorator
    The RebuildIndexesStagingProcessorDecorator is provided by cartridge core and rebuilds all indexes in the onPostReplicationHook, which refer to tables that are assigned to the selected staging processor and belong to the given staging components.
  • UnusableIndexesStagingProcessorDecorator
    The UnusableIndexesStagingProcessorDecorator is provided by cartridge core and sets all indexes of shadow tables that are assigned to the staging processor referenced by this decorator to unusable ( onPreSynchronizationHook).

    Note

    This decorator requires the RebuildIndexesStagingProcessDecorator described just before.

  • RemoveCatalogDecorator
    The RemoveCatalogDecorator is provided by cartridge bc_mvc and is used to mark the catalog domains as deleted, which are removed by the replication process ( onPostPublicationHook). In case of an UNDO replication process, the restored catalog domains are enabled again.
  • ExecuteQueryDecorator
    The ExecuteQueryDecorator is provided by cartridge core. It is based on FullStagingProcessor's switching $1 and $2 tables on publication phase. Further, it uses the INTERSHOP 7 Query Framework to execute query files on each staging hook to perform the replication.
    The staging queries to be executed have to follow the syntax requirements of the Query Framework and have to reside in the directory queries/staging. By convention, they have to be named following the schema <tablename> _ <hookname> with hookname being

    "on[Pre||Post|Error][Preparation|Synchronization|Replication|Publication|RefreshCache]Hook.query"
    

    E.g.: PRODUCT _ onErrorReplicationHook.query.

  • ShippingRuleEngineStagingProcessorDecorator
    The ShippingRuleEngineStagingProcessorDecorator is provided by cartridge bc_shipping and is used to reload the shipping rules in target system(s) for domains included in the current staging process, after the rules of cartridge bc_ruleengine were transferred ( onPostRefreshCacheHook).

5.4.4 Configuring Staging Processors

Staging processors are configured in the global staging configuration file staging.properties, which is located in <IS.INSTANCE.SHARE>/system/config/cluster.

Each staging processor configuration entry consists of

  • A key that represents the staging processor
    The key is used later on to assign the staging processor to a staging group (see Assigning Staging Processors to Staging Groups later below).
  • The staging processor class
    You can use custom processor classes or any of the staging processor classes discussed in Staging Processors for File System Content and Staging Processors for Database Content.
  • The (optional) staging processor decorators to execute with the staging processor
    You can use custom decorator classes or any of the processor decorator classes discussed in Staging Processor Decorators.

For a detailed description see the section Replication Configuration below.

5.5 Communication Between Replication Systems

As already stated before, a Mass data replication is started off by the editing system by informing each assigned target systems on a new replication process. Each target system then pulls the advised data from the editing system.

See figure below for clarification of the communication channels between source system and target system(s) during a Mass Data Replication.

TBD: Figure: Communication channels in a Replication system.

5.5.1 Command Flow Direction

Communication between the application servers of source and target system(s) is based on a web service (SOAP) and HTTP. The direction of the command communication flow is from source to target system: the source system sends SOAP requests to the Web server of the target system(s), which then forwards these requests to an application server belonging to the server group configured to handle Replication.

5.5.2 Data Flow Direction

File system content in data replication is retrieved by the target system via HTTP from the source system.

To allow the database content to be replicated from the editing system to the target system(s), an additional communication channel connects the database schema of the source system with the database schema(ta of each) of the involved target system(s). Here, two basic replication scenarios can be distinguished:

  • Local Database Replication
    The local database replication scenario assumes a single database instance both for the source and the target system. Source and target system use different users/schemata, with both users/schemata working on separate table sets.
  • Remote Database Replication
    In a remote database replication scenario, two different database instances are used, possibly residing on different hosts.

Note

Local data replication can have significant performance advantages over remote data replication. Use local data replication whenever possible.

In case of remote database replication, the connection is enabled by means of a database link from the target to the source system, as shown in Figure 1.
In case target and source system use database schemata in the same database instance (local database replication), the source system can grant access to certain tables to the target system.

5.5.3 Authentication of the Source System in a Target System: StagingIdentification

A special identification mechanism prevents the target system from performing data replication tasks triggered by other systems than the source system.

After getting a SOAP call from the naming service in the target system, the following steps are performed in order to uniquely identify the source system:

  1. In the source system, the createPermissionID() method in the StagingMgr is called.
  2. The StagingMgr thus creates a new UUID and inserts it into the STAGINGIDENTIFICATION table of the source system using JDBC.
  3. Subsequently, an arbitrary method of the StagingService is called in the target system, whereby this UUID is passed as a parameter.
  4. In the target system, the StagingService calls the checkPermissionID() method in the StagingMgr, providing the UUID.
  5. The StagingMgr of the target system searches this UUID in the STAGINGIDENTIFICATION table in the source system database using the database link resp. the direct access to the source database schema.

If the StagingMgr finds the UUID, it will accept the call, as the source system is now unambiguously identified. After processing the call, the target system's StagingMgr deletes the UUID in the STAGINGIDENTIFICATION table in the source system database, again using the database link resp. the direct access to the source database schema.
If the UUID is not found, the StagingMgr denies the access and throws an IdentificationException.

6 System Preparation for Data Replication

Before database content or file system directories can be replicated, some preparations are required by the staging processors in order to create the environment. Preparing the environment for data replication is the task of the preparer classes StagingGroupPreparer, StagingEnvironmentPreparer and related preparer classes (e.g., StagingTablePreparer), which are executed on DBInit.

Note

The Staging framework depends on the identical structure of the tables to be replicated. Moreover, it depends on identical UUIDs of all domains in the database and of all staging configuration.

Note

There is no automatic process, which initially copies the database content from editing to target system(s). Despite that, it is a task of the installation and deployment process to equalize the databases.

The easiest and highly recommended way to assure this is to execute a DBInit in the editing system, then to export the database with ant export(in the editing system) and to import the resulting dadabase dump in the target system(s) using ant import.

Another way would be the usage of DBMigrate on both, editing and target systems. In this case, all relevant UUIDs would be needed to be predefined.

The following sub sections describes all default staging processors provided with INTERSHOP 7. Please refer to the according JavaDoc and configuration examples in the respective Cookbookfor more detailed information on configuring these preparers.

6.1 Preparers to Create the Database Staging Configuration - DBInit

6.1.1 The StagingGroupPreparer (DBInit)

The StagingGroupPreparer class is the first preparer class called when preparing the database for data replication. It prepares all staging groups, staging tables, staging materialized views and staging directories and stores their configuration data in the corresponding STAGINGGROUP, STAGINGTABLE, STAGINGMVIEW and STAGINGDIRECTORY tables. Prepared staging groups can then be used by the pipeline ProcessReplicationGroupAssignment when assigning data replication groups to data replication tasks (see Assignment of staging groups to replication groups).

Note

The StagingGroupPreparer has to be executed before the StagingEnvironmentPreparer is executed.

To prepare staging groups, the StagingGroupPreparer usually uses the property files StagingGroup.properties(Staging Group preparation) and StagingGroupInformation.properties(Staging Processor - to - Staging Group assignment), which are part of the sub-package dbinit.data.staging(included in the dbinit.jar) of each cartridge.

6.1.2 The ResourceAssignmentPreparer (DBInit)

The Staging framework uses the Locking framework to assure the exclusive access to affected resources (i.e., database tables, files) during a replication process. Thus it prevents inconsistent data caused by parallel running jobs, imports etc.

Staging resource assignments are usually defined in ResourceAssignments.properties. They map staging groups (the key) onto one or more resource definitions of the locking framework (value)

6.1.3 The StagingEnvironmentPreparer (DBInit)

The StagingEnvironmentPreparer creates the environment (such as special database tables or views) which is necessary to replicate database tables.

The StagingEnvironmentPreparer

  • reads all configured staging groups of the current cartridge
  • gets the assigned staging processor for each staging group
  • gets the StagingTablePreparer associated with each staging processor.

The retrieved staging table preparer actually creates the necessary database structures. Since staging processors may impose different requirements on their database environment, each staging processor invokes its own StagingTablePreparer.

Figure: StagingEnvironmentPreparer and StagingTablePreparer

6.1.3.1 Database Environment for Full Replication

The following figure shows the environment which the StagingTablePreparer creates for database tables ( foobar and foobar_AV in the sample below) that are replicated via full replication (see Full Replication Processors).

The preparer

  1. Renames the source table into live table, e.g., table foobar$1.
  2. Creates a synonym for the live table, e.g., synonym foobar.
  3. Copies the live table structure to the shadow table, e.g., table foobar$2.
  4. Creates a source view, e.g., view foobar$S.

The resulting database structure is shown here:

Figure: Database Environment: Full Replication

The created database objects and their purpose are :

Object

Purpose

TABLE

Database tables contain the actual data (live table foobar$1, shadow table foobar$2).

SYNONYM

Table data are accessed by the Java application servers via synonyms (synonym foobar).

VIEW

Views provides access for the staging process to the table content in its according domain context, even if the accessed object itself does not have a domain ID. For example, the view foobar_AV$S joins the synonyms foobar_AV and foobar to get the domain ID from the table foobar$1. See also Foreign Key Definitions below.

6.1.3.2 Database Environment for Delta Replication

For tables replicated via delta replication, a more complex environment is required, due to the change tracking mechanism used. Changes are tracked in each staging table using a time frame defined by the last successful staging process and the current time. Inserts and updates are detected by the values in LASTMODIFIED column in each staging table.

Note

Each persistent object is responsible to set the LASTMODIFIED column after/before inserts and updates. If the persistent object is generated using jGen, this functionality will be automatically created.

Deletions are tracked using a deletion trigger and a special deletion table. The deletion trigger and deletion table for each $1 and $2 table are created by the DeletionTrackingStagingTablePreparer(see Figure below). The deletion table stores

  • the primary key and domain identifier of the source table and
  • the LASTMODIFIED column containing the deletion time.

The deletion trigger establishes the deletion tracking mechanism by copying primary key and domain identifier from the source table into the deletion table and setting the LASTMODIFIED column to current database date.

To prepare tables for delta replication, creating a structure shown in the next Figure, the preparer:

  1. Renames the source table into live table, e.g., table foobar$1.
  2. Creates a synonym for the live table, e.g., synonym foobar.
  3. Copies the live table structure to the shadow table, e.g., table foobar$2.
  4. Creates deletion tracking, e.g., deletion trigger T$foobar$1 and deletion table D$foobar$1, of the live table.
  5. Creates deletion tracking, e.g., deletion trigger T$foobar$2 and deletion table D$foobar$2, of the shadow table.
  6. Creates a live deletion synonym for the live deletion table, e.g., synonym D$foobar.
  7. Creates a source view, e.g., view foobar$S.
  8. Creates a source deletion view, e.g., view D$foobar$S.

Figure: Database Environment: Delta Replication

6.2 Preparers to Change the Database Staging Configuration - DBMigrate

6.2.1 Change Staging Group Configurations

AddStagingGroupsInformationPreparer
This preparer adds resp. updates the staging processor configuration, the (optionally) assigned domain, and the localized staging group information (display name, description) of staging groups, which already exist in the database.

Note

This preparer does not add new staging groups.

Note

When changing a database StagingProcessor, it is required to remove the staging environment before from all staging entities (using DeleteStagingEntitiesEnvironmentPreparer) of the staging group and to re-create the staging environment for the new staging processor (using MigrateStagingEnvironment).

6.2.2 Change Staging Group and Staging Entities Configurations

AddStagingGroupsPreparer
This preparer is used to add new staging groups AND the according staging entities of these staging groups.

Note

This preparer does not update existing staging groups.

UpdateStagingGroupsPreparer
This preparer is used to update the attributes (group configuration) AND re-creates the according staging entities of staging groups belonging to current cartridge.

Note

This preparer does not allow to add new staging groups.

RemoveStagingGroupsWithEntitiesPreparer
This preparer is used to remove given staging groups AND all their assigned staging entities. Additionally, the replication task assignments and the staging group resource assignments of the respective staging groups are removed.

Note

This preparer does not remove the staging environment (i.e., the $1, $2, $S etc.) from the staging tables to be removed. Call the DeleteStagingEntitiesEnvironmentPreparer before to strip the staging environment from staging entities to be removed.

6.2.3 Change Staging Entities Configurations

DeleteStagingEntitiesEnvironmentPreparer
This preparer is used to remove the staging environment from staging entities (staging tables).

DeleteStagingEntitiesPreparer
This preparer is used to delete existing staging entities from staging groups.

Note

This preparer does not remove staging groups, even if the staging group would become empty.

Note

This preparer does not remove the staging environment (i.e., the $1, $2, $S etc.) from the staging tables to be removed. Call the DeleteStagingEntitiesEnvironmentPreparer before to strip the staging environment from staging entities to be removed.

UpdateStagingEntitiesPreparer
This preparer is used to add resp. update staging entities of a single staging group.

Note

This preparer does not allow to add new staging groups nor to remove staging entities.

6.2.4 Change Staging Resource Assignment

AddResourceAssignmentsPreparer
This preparer is used to add additional resource assignments to staging groups.

6.2.5 Change Staging Environment

MigrateStagingEnvironment
This preparer is used to migrate the staging environment of current cartridge. It is normally used after staging groups or staging entities were changed resp. added.

6.3 Replication Configuration

(Mass) Data Replication uses several configuration files:

  • staging.properties provides basic settings like the system role (edit, live) for the current INTERSHOP 7 system, time-out settings, staging processor settings.
  • replication-clusters.xml provides the communication settings for an editing system to connect to live system(s).
  • replication.xml defines the replication groups, which are useable in the back office. Moreover, here can be defined recurring Mass Data replication processes.

All of these file reside in share/system/config/cluster. A closer description of these files is given below.

6.3.1 staging.properties

This file is used in both, source (editing) and target (live) systems. Configurable settings are:

Property

Default (Development)

Type

Range

Live

Description

General settings:

 

 

 

 

 

staging.system.type

ESL 6.x: editing;
IS7.x: none

String

editing
live
none

 

Defines the type of staging system.

  • editing: The system is used to import, add, update or delete staging content.
  • live: The system is used to process storefront requests. It gets new content from editing system using the staging process.
  • none: The system does not use staging mechanisms.

staging.system.name
(only ESL 6.x)

host (Editing System)

String

 

 

The name of staging system. Arbitrary names are supported.

staging.statement.
analyze.table
(only up to ESL6.4)

inactiv, begin gather_table_
stats(?) ; end;

String

 

 

This SQL statement is used by decorator c.i.b.c.c.s.p.AnalyzeTablesDecorator. This class uses the given statement for analyzing staging tables during each staging process in editing as well as live system. The value '{0}' is replaced by the according table name. If an '?' is used the table name is provided as bind value.

Note: In releases higher than ESL6.4, the AnalyzeTablesDecorator uses the general configuration from database.properties.

staging.prepareOnDBInit
(only up to ESL 6.4)

false

Boolean

 

 

Defines if the staging environment should be prepared during DBInit (true) or during first staging process (false)

staging.
suppressInitialReplication
(only up to ESL 6.4)

false

Boolean

 

 

In case of the database is filled with the same database dump in source and target system you can set this property to 'true'. It avoids the call of stored procedure 'sp_copy_db_content(..)' during initial replication process to reduce its duration. Tested with true and initial dbinit: It works.

staging.WebServerURL

inactive, empty

URL

 

 

The web server URL being used by staging processes (optional). In the live system the property configures the URL of SOAP staging service. In the editing system it configures the web server from which one the files should be downloaded. If no value is set it uses the standard web server URL configured in the 'appserver.properties'.

Database communication settings:

 

 

 

 

 

staging.database.connection.factory

com.intershop.beehive.
core.capi.staging.
OracleDriverConnectionFactory

String

 

 

Defines the database connection factory to be used during staging process.

  • com.intershop.beehive.core.capi.staging.
    OracleDriverConnectionFactory
    :
    does not use the configured JDBC pool (avoid problems caused by a known ORACLE bug using DB links)
  • com.intershop.beehive.core.capi.staging.
    DefaultConnectionFactory
    :
    uses the standard JDBC pool (e.g. UCP)

Parallelism Section:

 

 

 

 

These properties should be set in live and editing system. They are used to configure the parallelism behavior during a staging process.

staging.process.
NrOfParallelProcessors

2

Integer

 

 

The number of parallel executed staging processors.

staging.process.
EntityParallelism

3

Integer

 

 

The number of parallel replicated entities per staging processor.

Note: Currently only the FullDomainSpecificStagingProcessor supports this setting.

staging.process.
StatementParallelism

1

Integer

 

 

The number of parallel threads within database performing a SQL statement.

See: PARALLEL in hints in Oracle (it works only with an Oracle Enterprise Edition).

staging.process.
MinRowsForStatement
Parallelism

1000000

Integer

 

 

The minimum number of rows a table must have to replicate its content with parallel SQL hints configured in the property above.

Timeout section:

 

 

 

 

These properties should be set in live system. If a timeout is reached, the staging process proceeds its execution. An according error is logged in the error log file.

WARNING: In case of a timeout is reached the page cache may possess inconsistent data.
FIX: Restart all application servers, that did not response and remove the page cache.

staging.locking.
acquisition.timeout

1200
= 20min

Integer

 

 

The maximum time the staging process waits for resources (in s).

staging.timeout.
cacheRefresh

600
= 10min

Integer

 

 

Defines the maximum time the staging process waits for each application server refreshing the cache of persistence layer (in s).

staging.timeout.
switchDirectories

600
= 10min

Integer

 

 

Defines the maximum time the staging process waits for each application server switching their directories (in s).

staging.timeout.
waitingForState

7200
= 2h

Integer

 

 

The maximum time the staging process waits for a new state during a staging process (in s).

Note: This property is also required in source system!

staging.timeout.
initialReplication
(only ESL 6.x)

10800

Integer

 

 

Defines the maximum time the initial staging process waits for the initial database replication.

Live System Configuration section:
(only ESL 6.x)

 

 

 

 

These properties need to be set in the target system.

staging.dblink.name

inactive, ISEDITING

String

 

 

This parameter defines name of database link from target (live) to source (editing) database. Please use only this OR staging.editing.schema.name.

staging.editing.schema.
name

inactive, empty

String

 

 

Defines the name of the editing schema . If this property is set, the staging process will directly access the editing schema, instead of using the database link. This property must be set in the live system only. The editing and the live schema have to be located in the same database instance.

Important Note: If this property is set, the live user has to grant object privileges on certain objects of the editing system. Staging will fail, if you don't do this properly, if you are unsure, simply leave the property unset. It is therefore necessary to login to the database as editing user and execute the stored procedure staging.grant_live_user_privs('NAME_OF_LIVE_USER')
An example: 'exec staging.grant_live_user_privs('INTERSHOP_LIVE0')'

staging.live.servergroup

BOS

String

 

 

Defines which server group should be used for staging.

Staging Processor Configuration Section:

 

 

 

 

This section contains the configuration of the staging processors. These setting express the assignment of the staging processor name as defined in the Staging-Processor-To_Staging-Group assignment (StagingGroupInformation.properties), which only represents a processor name (like an alias) to an implementing staging processor Java class together with assigned staging processor decorator(s)' Java class(es).
Since these settings are subject of release specific changes, here is only depicted the syntax of the assignment, together with an example taken from the IS 7.0 release.
For information on the standard staging processor and decorator classes, please see the Mass Data Replication Concept.

Warning: If these properties contain invalid entries, staging can result in data corruption! Please make sure you have understood the documentation before changing theese settings!

Syntax:

 

 

 

 

 

staging.processor.<StagingProcessorName>.className = <implementingClassInclusiveJavaPackage>
staging.processor.<StagingProcessorName>.decorator.<consecutiveNumbered> = <implementingDecoratorClassInclusiveJavaPackage>

 

 

 

 

 

Example: FullDomainSpecificStagingProcessor

 

 

 

 

Configuration of the database staging processor that transfers domain specifc data (Products, Discounts, etc.).

staging.processor.FullDomainSpecificStagingProcessor.className = com.intershop.beehive.core.capi.staging.process.FullDomainSpecificStagingProcessor
staging.processor.FullDomainSpecificStagingProcessor.decorator.0 = com.intershop.beehive.core.capi.staging.process.AnalyzeTablesDecorator
staging.processor.FullDomainSpecificStagingProcessor.decorator.1 = com.intershop.beehive.core.capi.staging.process.DisableConstraintsDecorator
staging.processor.FullDomainSpecificStagingProcessor.decorator.2 = com.intershop.component.mvc.capi.staging.RemoveCatalogDecorator
staging.processor.FullDomainSpecificStagingProcessor.decorator.3 = com.intershop.beehive.core.capi.staging.process.ExecuteQueryDecorator

 

 

 

 

 

Staging index/constraint performance section:
(only IS7.x)

 

 

 

 

These settings should be set in the target system. They work with FullDomainSpecificStagingProcessor or derived classes and add the following features around the existing
"insert /*+ append */ ...":

  • indexes unusable

    insert ... 
     [alter session force parallel ddl parallel <nr>] 
     indexes rebuild nologging
  • constraints disable

    insert ... 
     [alter session force parallel ddl parallel <nr>] 
     constraints enable validate

staging.process.unusableIndex.rowCountLimit[.TableName]

0

Integer

 

 

Set the 'global' or 'table' specific limit (table row count) to enable the unusable index processing.

It is possible to overwrite the 'global' value per table.

Note: Write table names in UPPERCASE letters.

Hint:
Avoid global, i.e., enabled every time. Small tables (rowcount) consume more processing time for handling if this feature is enabled instead of disabled.

RowCountLimit values:

  • 0 = disabled, default
  • 1 = enabled every time
  • >1 = enabled if table row count is greater or equal to the value

Examples:

 

 

 

 

(default value):

 staging.process.unusableIndex.rowCountLimit = 0 



valid for tables with rowcount > 100000:

 staging.process.unusableIndex.rowCountLimit = 100000 



valid, if rowcount of table PRODUCT is > 200000:

 staging.process.unusableIndex.rowCountLimit.PRODUCT = 200000 

staging.process.unusableIndex.rebuildParallelism

1

Integer

 

 

Set the number of parallel threads within database performing a SQL unusable index rebuild statement.

staging.process.disableConstraint.rowCountLimit[.TableName]

see Examples below

Integer resp. String

 

 

Set the 'global' or 'table' specific limit (table row count) to enable the disable constraint processing.

It is possible to overwrite the 'global' value per table.

Note: Write table names in UPPERCASE letters.

Hint:
Avoid global, i.e., enabled every time. Small tables (rowcount) consume more processing time for handling if this feature is enabled instead of disabled.

RowCountLimit values:

  • 0 = disabled, default
  • 1 = enabled every time
  • >1 = enabled if table row count is greater or equal to the value
  • use ${staging.process.unusableIndex.rowCountLimit} to use identical rowcount limits (this is the default setting).

Examples:

 

 

 

 

(default value):

 staging.process.disableConstraint.rowCountLimit =  ${staging.process.unusableIndex.rowCountLimit} 



valid, if rowcount of table PRODUCT is > 200000:

 staging.process.disableConstraint.rowCountLimit.PRODUCT = ${staging.process.unusableIndex.rowCountLimit.PRODUCT} 

staging.process.disableConstraint.enableParallelism

see Description column

Integer resp. String

 

 

Set the number of parallel threads within database performing a SQL disable constraint statement.

Use ${staging.process.unusableIndex.rebuildParallelism} to use identical thread limits (this is the default setting).

staging.contextIndexCreationMode

sync

String

sync
async
disabled

 

Defines, how the staging process behaves depending on the creation of context indexes.

It is possible to overwrite the 'global' value per table.

Note: Write table names in UPPERCASE letters.

If there is no or no valid value for a staging table, the general setting is used. If the general setting is not set or not valid, the default 'sync' is used.

Valid contextIndexCreationMode values are (in upper or lower case):

  • sync - synchron, staging ends after created indexes (default)
  • async - asynchron, staging ends without waiting for indexes
  • disabled - no context index will be created

Examples:

 

 

 

 

(default value):

 staging.contextIndexCreationMode=sync 



valid for table PRODUCT:

 staging.contextIndexCreationMode.PRODUCT=async 

Staging Processor Configuration Section, older version:

 

 

 

 

This section contains the configuration of the staging processors, as it was valid in ESL6.5. These setting are possibly not up-to-date now. Since these settings are subject of release specific changes, probably these old information will be removed soon.

staging.processor.
FullDomainSpecific
StagingProcessor.
className

c.i.b.c.c.s.p.Full
DomainSpecific
StagingProcessor

String

 

 

Configuration of the database staging processor that transfers domain specific data (Products, Discounts, etc.). The processor replicates only the content of the selected domains during a batch process. Processor class is used to stage tables containing domain specific content.

staging.processor.
FullFastDomain
SpecificStaging
Processor.decorator.0

c.i.b.c.c.s.p.Analyze
TablesDecorator

String

 

 

See previous description. This decorator is used to analyze tables of editing and live system during staging process. In editing system tables are analyzed on preparation hook, in live system on replication hook.

staging.processor.
FullFastDomain
SpecificStaging
Processor.decorator.1

c.i.b.c.c.s.p.Disable
ConstraintsDecorator

String

 

 

See previous description. This decorator is used to disable all constraints on shadow tables of live system before the synchronization starts. After replication the constraints will be enabled.

staging.processor.
FullFastDomain
SpecificStaging
Processor.decorator.2

c.i.c.m.c.s.Remove
CatalogDecorator

String

 

 

See previous description. This decorator is used to mark the catalog domains as deleted, which are removed by the replication process.

staging.processor.
FullFastDomain
SpecificStaging
Processor.decorator.3

c.i.b.c.c.s.p.Execute
QueryDecorator

String

 

 

This staging processor is based on full staging processor switching $1 and $2 tables on publication phase. Further, it calls query files on each staging hook to perform the replication.

staging.processor.
FullStagingProcessor.
className

c.i.b.c.c.s.p.Full
StagingProcessor

String

 

 

Configuration of the database staging processor transferring system content like regional settings, permissions, roles, etc. This processor is used to perform staging processes for tables containing system wide content.

staging.processor.
FullStagingProcessor.
decorator.0

c.i.b.c.c.s.p.Analyze
TablesDecorator

String

 

 

This decorator is used to analyze tables of editing and live system during staging process. In editing system tables are analyzed on preparation hook, in live system on replication hook.

staging.processor.
FullStagingProcessor.
decorator.1

c.i.b.c.c.s.p.Disable
ConstraintsDecorator

String

 

 

This decorator is used to disable all constraints on shadow tables of live system before the synchronization starts. After replication the constraints will be enabled.

staging.processor.
DeltaDomainSpecific
StagingProcessor.
className

c.i.b.c.c.s.p.Merge
DomainSpecific
StagingProcessor

String

 

 

Configuration of the database staging processor transferring domain specific content, that may be written in storefront of live system (like Users). This staging processor is used to replicate database content residing in database tables being changed in source as well as target system. Due to this, the replication occurs in one huge transaction. It uses the 'MERGE' sql statement to transfer the new and updated content and uses the deletion tracking with deletion trigger to realize removed rows in editing system. The 'MERGE' statement has a restriction. It does not work on tables having a column with a context index. So, only tables with normal indexes are supported.

staging.processor.
DeltaDomainSpecific
StagingProcessor.
decorator.0

c.i.b.c.c.s.p.Disable
ConstraintsDecorator

String

 

 

This decorator is used to disable all constraints on shadow tables of live system before the synchronization starts. After replication the constraints will be enabled.

staging.processor.
AppendDomainSpecific
StagingProcessor.
className

c.i.b.c.c.s.p.Append
DomainSpecific
StagingProcessor

String

 

 

Configuration of the database staging processor transferring domain specific content, that is only appended to live system content. Old content is whether replicated, deleted nor changed.

staging.processor.
MergeDomainSpecific
StagingProcessor.
className

c.i.b.c.c.s.p.Merge
DomainSpecific
StagingProcessor

String

 

 

Configuration of the database staging processor transferring domain specific content, that may be written in storefront of live system (like Users) and have a lot of rows in live system. The undo process is not supported.

staging.processor.
FileSystemStaging
Processor.className

c.i.b.c.c.s.p.Simple
FileSystemStaging
Processor

String

 

 

Configuration of the file system staging processor transferring simple files (gifs,...).

staging.processor.
LocalizationStaging
Processor.className

c.i.b.c.c.s.p.Simple
FileSystemStaging
Processor

String

 

 

Configuration of the file system staging processor transferring localization files. It is based on file system staging processor too.

staging.processor.
LocalizationStaging
Processor.decorator.0

c.i.b.c.c.s.p.Refresh
Localizations
Decorator

String

 

 

The decorator reloads the localization files in live system after wards the localization files are replicated.

staging.processor.
SearchIndexes
StagingProcessor.
className

c.i.b.c.c.s.p.Simple
FileSystemStaging
Processor

String

 

 

Configuration of the file system staging processor transferring search indexes. It is based on file system staging processor too.

staging.processor.
SearchIndexes
StagingProcessor.
decorator.0

c.i.c.f.c.r.Refresh
SearchIndexes
Decorator

String

 

 

The decorator refreshes the search indexes on each application server in live system.

staging.processor.
MViewStaging
Processor.className

c.i.b.c.c.s.p.
MViewStaging
Processor

String

 

 

Configuration of the mview staging processor refreshing materialized views referencing affected tables.

staging.processor.
RulesStaging
Processor.className

c.i.b.c.c.s.p.FullFast
DomainSpecific
StagingProcessor

String

 

 

Configuration of the database staging processor transferring Rules. This processor uses direct path SQL statements improving performance during replication of huge amount of data. Further, during replication the indexes are not maintained. After wards the replication has been finished the rebuild of all indexes affected by replication will be rebuilt. Furthermore, replicated Rules will be reloaded in the target system. This staging processor operates in the same way like FullStagingProcessor, but uses special SQL statements, that disables redo logging in Oracle database. NOTE: In case of a database crash the data inserted by this staging processor are not recoverable due to only direct load DML is used.

staging.processor.
RulesStaging
Processor.decorator.0

c.i.b.c.c.s.p.Analyze
TablesDecorator

String

 

 

This decorator is used to analyze tables of editing and live system during staging process. In editing system tables are analyzed on preparation hook, in live system on replication hook.

staging.processor.
RulesStaging
Processor.decorator.1

c.i.b.c.c.s.p.Disable
ConstraintsDecorator

String

 

 

This decorator is used to disable all constraints on shadow tables of live system before the synchronization starts. After replication the constraints will be enabled.

staging.processor.
RulesStaging
Processor.decorator.2

c.i.b.c.c.s.p.Execute
QueryDecorator

String

 

 

This staging processor is based on full staging processor switching $1 and $2 tables on publication phase. Further, it calls query files on each staging hook to perform the replication.

staging.processor.
RulesStaging
Processor.decorator.3

c.i.c.s.c.s.Shipping
RuleEngineStaging
ProcessorDecorator

String

 

 

This decorator is used to reload the shipping rules, after the rules of cartridge bc_ruleengine were transferred.

staging.objects.chunksize

inactive, 15

Integer

 

 

Business Object Replication: If the user plans to replicate a lot of objects (e.g. 10000 products), these objects will be sent several loops, 15 objects each loop and the cache refresh is started after all objects are sent and merged. Remember that Business Object Replication is only meant for emergency updates of a few objects.
If you want to replicate a lot of data use the Mass Data Tasks menu.

6.3.2 replication-clusters.xml

In replication-clusters.xml the communication parameters used for Replication are specified. Actually, these settings define the communication infrastructure for both, mass data replication, and also business object replication ( Fast Publishing, e.g., of products).

replication-clusters.xml resides in editing (source) system(s).

6.3.2.1 TargetSystem and TargetCluster

INTERSHOP 7 supports system setups, that can be spatially distributed over multiple data centers, each data center keeping it's own database, webservers and appservers with (among others) their own database users and web-URLs. From a physical and IT technical point of view, the INTERSHOP 7 systems in all data centers are different systems, but from a business point of view, they may form a logical unit.

For data replication this means, that the target of one replication process might be not only one single INTERSHOP 7 system, but might be several INTERSHOP 7 systems residing in multiple data centers. Therefore, the concept of data replication with one target system as the recipient of replication data was extended to replication target clusters.

A replication target cluster represents the recipient of replication data from a business point of view. Logically one recipient, it consists technically of one or more replication target systems, whereby a target system represents one (technical) INTERSHOP 7 cluster with it's own web URL and database user.

A data replication manager will now select one replication target cluster as the target of a data replication process, whereby under the surface the replication mechanism will have to transfer the data to every target system belonging to the selected target cluster.

According this, the data replication's configuration now needs to provide information about the replication clusters, which potentially are intended to be updated by the respective source system. Moreover, it has to keep the information, what target systems belong to each of the target clusters, and how these target systems can be reached.

6.3.2.2 Configuration

This file defines the communication parameters both, for Mass Data Replication as for Business Object Replication. It is required in the source (editing) system of a Data Replication environment.

The XML file structure is defined in replication.xsd.

Some example configurations are shown in the Cookbook - Mass Data Replication - Administration.

6.3.2.2.1 Basic content
  • The file contains the replication-configuration, defining the xsd schema localization, and one target clusters list.
basic replication config with target clusters list
<?xml version="1.0" encoding="UTF-8" ?>
<replication-configuration
    xsi:schemaLocation="http://www.intershop.com/xml/ns/enfinity/6.5.0/core/replication replication.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.intershop.com/xml/ns/enfinity/6.5.0/core/replication">

    <target-clusters>
        ....
    </target-clusters>

</replication-configuration>

  • The target clusters list keeps one or more target cluster definitions, each identified by an id attribute.
target clusters list with target cluster definitions
    <target-clusters>
        <target-cluster id="Cluster1">
            ..
        </target-cluster>
        ..
        ..
        <target-cluster id="ClusterN">
            ..
        </target-cluster>
    </target-clusters>

  • Each target cluster configuration has to set an id. A target cluster configuration contains one target systems list.
target cluster definition with target systems list
        <target-cluster id="Cluster42">
            <target-systems>
                ...
            </target-systems>
        </target-cluster>

  • The target systems list keeps one or more target system definitions, each identified by an 'id' attribute and holding an active attribute.
target systems list with target system definitions
            <target-systems>
                <target-system id="TargetSystem1" active="true">
                    ..
                </target-system>
                ..
                ..
                <target-system id="TargetSystemN" active="false">
                    ..
                 </target-system>
            </target-systems>

  • Each configuration of a target system owns an 'id' attribute, an active attribute and a set of connection parameters.
    The active attribute can be "true" or "false". It defines, whether a target system configuration is used for data replication or not.
    The connection parameters involve:
    1. The web server URL of the target system
    2. If required, the target system's server group definition (see following explanation).
    3. The source system's server group definition.
    4. The database access configuration.
    • If the target system uses URLMapping, then the complete URL to the SOAP servlet (including the server group to be used in the target system) has to be given, according to the settings for intershop.urlmapping.urlPrefix and intershop.urlmapping.servlet.webadapter in appserver.properties of the target server.
      NOTE: In this case, you must not set an explicit target server group!

The spaces around the url in the xml snippet are only inserted to work around a known Confluence bug. They must not be written in a real replication-clusters.xml

target system definition; Web server URL when using URLMapping
                <target-system id="TargetSystem_with_URLMapping" active="true">
                    <webserver-url> http://ts3.mydomain.com:80/INTERSHOP/servlett/BOS/SOAP </webserver-url>
                    ..
                 </target-system>

  • If no URLMapping is configured in appserver.properties of the target system, INTERSHOP 7 will use default settings for it's servlet pathes. In this case, provide only the target web server url consisting of protocoll, hostname and port.
    Additionally, you have to provide a target server group.

The spaces around the url in the xml snippet are only inserted to work around a known Confluence bug. They must not be written in a real replication-clusters.xml

target system definition; Web server URL and target server group when not using URLMapping
                <target-system id="TargetSystem_without_URLMapping" active="true">
                    <webserver-url> http://ts2.mydomain.com:80 </webserver-url>
                    <target-server-group>STG</target-server-group>
                    ..
                 </target-system>

  • The server groups to be used in the source system.
target system definition; Source system server group
                <target-system id="TargetSystem" active="true">
                    ..
                    <source-server-group>BOS</source-server-group>
                    ..
                 </target-system>

  • Database connection to be used by the target system.
    There are two ways for a target system to connect to the source system: database link or direct schema access.
  • In case a database link has to be used, provide the name of a database link, which was created in the target system to access the editing database.
target system definition; Database access via database link
                <target-system id="TargetSystem_using_DBLink" active="true">
                    ..
                    <source-database-link>ISEDITING.world</source-database-link>
                 </target-system>

  • Alternatively, if source and target system' database schemata reside in the same database instance, you may prefer to use direct schema access from target to source schema. In this case provide the name of the target schema name. INTERSHOP 7 will take care to grant the required schema access rights for the target database schema to the source database schema.
target system definition; Database access via database link
                <target-system id="TargetSystem_using_DBLink" active="true">
                    ..
                    <target-database-user>INTERSHOP_LIVE</target-database-user>
                 </target-system>

Note

It is possible to use database access via database link from one, and direct database access from another target system within one target cluster.

6.3.2.2.2 Complete Example Configuration

The following example shows some basic configuration examples of replication-clusters.xml.

Note

The spaces around the urls in the xml snippet are only inserted to work around a known Confluence bug. They must not be written in a real replication-clusters.xml

basic replication-clusters.xml
<?xml version="1.0" encoding="UTF-8" ?>
<replication-configuration
    xsi:schemaLocation="http://www.intershop.com/xml/ns/enfinity/6.5.0/core/replication replication.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.intershop.com/xml/ns/enfinity/6.5.0/core/replication">

    <target-clusters>
        <target-cluster id="Cluster42">
            <target-systems>
                <target-system id="TargetSystem1" active="true">
                    <webserver-url> http://ts1.mydomain.com:80 </webserver-url>
                    <source-server-group>BOS</source-server-group>
                    <target-server-group>BOS</target-server-group>
                    <source-database-link>ISEDITING.world</source-database-link>
                </target-system>
                <target-system id="TargetSystem2" active="false">
                    <webserver-url> http://ts2.mydomain.com:80 </webserver-url>
                    <source-server-group>BOS</source-server-group>
                    <target-server-group>STG</target-server-group>
                    <target-database-user>INTERSHOP_LIVE</target-database-user>
                </target-system>
                <target-system id="TargetSystem_with_URLMapping" active="true">
                    <webserver-url> http://ts3.mydomain.com:80/INTERSHOP/servlett/BOS/SOAP </webserver-url>
                    <source-server-group>WFS</source-server-group>
                    <source-database-link>ISEDITING.world</source-database-link>
                 </target-system>
            </target-systems>
        </target-cluster>
    </target-clusters>


</replication-configuration>

Explanations:

+"Cluster42":
The file contains one cluster definition for the cluster named "Cluster42", which involves three target systems, "TargetSystem1", "TargetSystem2" and "TargetSystem_with_URLMapping".

"TargetSystem1":

  • is active, i.e., will be used as a replication target.
  • uses default URL mapping in target system
  • uses server group BOS both, in source as in target system
  • uses database link ISEDITING.world, which has to be defined in the target database schema to point to the source schema

"TargetSystem2":

  • is inactive, i.e., will not be used as a replication target.
  • uses default URL mapping in target system
  • uses server group BOS in source and server group STG in target system
  • uses direct database access to the source schema. The target database schema is named INTERSHOP_LIVE; the system will grant access in the source schema to INTERSHOP_LIVE

"TargetSystem_with_URLMapping":

  • is active, i.e., will be used as a replication target.
  • uses changed URL mapping in target system, according to the settings for 'intershop.urlmapping.urlPrefix' and 'intershop.urlmapping.servlet.webadapter' in appserver.properties of the target server, where URLPrefix '/INTERSHOP' and urlmapping.servlet '/servlett' (sic!) is used.
  • uses server group BOS in target system.
    Note, that the target server group is defined only as part of the <webserver-url> and isn't given as <target-server-group>!
  • uses server group WFS in source system
  • uses database link ISEDITING.world, which has to be defined in the target database schema to point to the editing schema

6.3.3 replication.xml

Together with replication_clusters.xml, replication.xml holds the configuration for the data replication functionality of INTERSHOP 7. While replication_clusters.xml defines the communication channels as for mass data replication as for business object replication (fast publishing e.g., of Products), replication.xml is only used by mass data replication.

In replication.xml, the replication groups and their descriptions, which are useable in the back office, are defined. Additionally, in this file Mass Data replication processes can be predefined.

The XML file structure is defined in replication.xsd.

replication.xml is required in editing (source) system(s).

6.3.3.1 Basic Structure

The file replication.xml consists of three parts:

  • the definition of the Data Replication Groups for the Data Replication Manager's back office,
  • the definition of mass data replication processes, which can be executed by jobs (recurring or single times),
  • and the definition of the replication tasks, which are referenced by the replication processes

While the groups section is mandatory, processes and tasks definitions are optional.

The following schema shows the basic structure of replication.xml.

replication.xml, Basic structure
<?xml version="1.0" encoding="UTF-8" ?>
<replication-configuration
    xsi:schemaLocation="http://www.intershop.com/xml/ns/enfinity/6.5.0/core/replication replication.xsd"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.intershop.com/xml/ns/enfinity/6.5.0/core/replication">

    <!--
    In this (mandatory) section all replication groups are defined, that are shown in the Data Replication Manager's backoffice.
    -->

    <groups>
    ...
    </groups>

    <!--
    In this (optional) section all replication processes are specified, that can be replicated by job 'Regular Replication Process' in SLDSystem (manually,
    i.e., single time, or on a regular, i.e., recurring base) .
    -->

    <processes>
    ...
    </processes>

    <!--
    This (optional) section contains all replication tasks, that can be reused by several replication processes. Each referenced
    replication task is created at the beginning of replication process in according enterprise or channel.
    -->

    <tasks>
    ...
    </tasks>

</replication-configuration>

6.3.3.1.1 Replication Group Configuration

The following schema shows an excerpt of replication.xml dealing with the replication group definition.

A replication group definition consists of:

  • a unique replication group ID. This ID is used in the ProcessReplicationGroupAssignment pipelines to determine the required assignment sub pipeline.
  • a list of business unit types. This list defines, in which kind of business units (organization, channel) the replication group is shown to the Data Replication Manager. Currently, the types 20 (organization), 30 (partner channel), and 52 (b2c channel) are supported by default.
  • a sequence of locale specific name / description definition. Each locale definition consists of:
    • a locale id. It defines, for which locale the replication group name and description are presented.
    • a replication group name as to be shown in backoffice, and
    • a textual description of the replication group, as to be shown in backoffice

The example below depicts the replication group "Search Indexes" with configurations for locales "en_US" and "de_DE".

replication.xml, Replication Group structure
<?xml version="1.0" encoding="UTF-8" ?>
<replication-configuration
    ...>
    ...

    <!--
    In this (mandatory) section are defined all replication groups, that are shown in the Data Replication Manager's backoffice.
    -->

    <groups>

      <group id="SEARCH_INDEXES" >
        <business-unit-types>20 30 52</business-unit-types>
        <locale id="en_US">
          <name>Search Indexes</name>
          <description>Search indexes and their configuration, search query definitions (predefined product filters), and search redirects. 
                       Note: The objects group that is indexed, e.g. PRODUCTS and PAGELETS, must be added to avoid inconsistencies.
          </description>
        </locale>
        <locale id="de_DE">
          <name>Suchindizes</name>
          <description>Suchindizes und Indexkonfiguration, vordefinierte Suchanfragen und Such-Redirects. 
                       Achtung: Die Replikationsgruppe, die indizierte Objekte enthält (z.b. PRODUCTS, CATALOG oder PAGELETS), muss ebenfalls
                       repliziert werden.
          </description>
        </locale>
      </group>

      <group id="... >
      ...
      </group>

    </groups>

    ...
</replication-configuration>

6.3.3.1.2 Predefined Mass Data Replication Processes

The following schema shows an excerpt of replication.xml dealing with the (mass data) replication process definition. These process definitions can be read by job Regular Replication Process in domain SLDSystem to create automated replication processes.

Note

If no predefined replication processes are needed, remove the "processes" section from replication.xml or comment it.

A replication process definition consists of:

  • a replication process ID. This ID is used in the job configuration of Replication Process Scheduler as attribute 'ReplicationProcessID'.
    • Note: Since job Regular Replication Process can hold only one process ID in its attribute 'ReplicationProcessID', it is necessary to create (copy) an own job for each replication process to be executed by job.
  • a replication process type. Valid typed for predefined replication processes are:
    • Replication
    • Publication and
    • ReplicationPublication
  • a process description
  • the ID of the target cluster for the replication process. The ID is case sensitive. It refers to the ID of a target cluster as defined in replication_clusters.xml
  • a sequence of replication task references. These references are the IDs of replication task definitions, which have to be defined also in replication.xml (for details see below).

The example below depicts the definition of a replication process "nightly" of type "ReplicationPublication" with attached replication tasks "PrimeTechProducts" and "PrimeTechSpecialsProducts".

replication.xml, Replication process definition
<?xml version="1.0" encoding="UTF-8" ?>
<replication-configuration
    ...>
    ...

    <!--
    In this (optional) section are specified all replication processes, that can be replicated by job 'Regular Replication Process' in SLDSystem (manually,
    i.e., single time, or on a regular, i.e., recurring base) .
    -->

    <processes>
      ...
      <process id="nightly">
        <type>ReplicationPublication</type>
        <description>This process is started every night.</description>
        <target-cluster-id>Cluster42</target-cluster-id>
        <task ref="PrimeTechProducts"/>
        <task ref="PrimeTechSpecialsProducts"/>
      </process>
      ...
    </processes>

    ...
</replication-configuration>

6.3.3.1.3 Predefined Mass Data Replication Tasks

The following schema shows an excerpt of replication.xml dealing with the (mass data) replication task definition. These task definitions are referenced by job Regular Replication Process in domain SLDSystem when creating automated replication processes.

Note

If no predefined replication processes / tasks are needed, remove the "processes" and "tasks" sections from replication.xml or comment them.

 

A replication task definition consists of:

  • a replication task ID. This ID is used by the replication process configuration in replication.xml as replication task references.
    • Note: task IDs in replication task definitions and task references in replication process definitions are case sensitive.
  • an organization (name), for which data has to be replicated resp. of whose channel's data has to be replicated (mandatory)
    • Note: the organization name is case sensitive and needs to be written like in table DOMAININFORMATION.
  • a channel (name), for which data has to be replicated (optional; only if channel data has to be replicated)
    • Note: the channel name is case sensitive and needs to be written like in table DOMAININFORMATION.
  • a task description
  • a sequence of replication group references (the groups, which are required to be part of the replication task). These group references are the IDs of replication group definitions, which have to be defined also in replication.xml (for details see above).

The example below depicts the definition of

  • a replication task "PrimeTechProducts", transferring data of replication group PRODUCTS of the organization "PrimeTech", and
  • a replication task "PrimeTechSpecialsProducts", transferring data of replication groups CATALOGS and PRODUCTS of the channel "PrimeTechSpecials" of organization "PrimeTech"
replication.xml, Replication task definition
<?xml version="1.0" encoding="UTF-8" ?>
<replication-configuration
    ...>
    ...

    <!--
    This (optional) section contains all replication tasks, that can be reused by several replication processes. Each referenced
    replication task is created at the beginning of replication process in according enterprise or channel.
    -->

    <tasks>
      ...
      <task id="PrimeTechSpecialsProducts">
        <organization>PrimeTech</organization>
        <channel>PrimeTechSpecials</channel>
        <description>Replicates all products of channel PrimeTechSpecials</description>
        <group ref="CATALOGS"/>
        <group ref="PRODUCTS"/>
      </task>
      <task id="PrimeTechProducts">
        <organization>PrimeTech</organization>
        <description>Replicates all products of channel PrimeTechSpecials</description>
        <group ref="PRODUCTS"/>
      </task>
      ...
    </tasks>

    ...
</replication-configuration>

 For customization aspects, see the according information provided in the Cookbook - Mass Data Replication - Customization and Adaption.

6.3.4 Replication Chains

A data replication system can be configured to serve as source and target system. Hence, it is possible to set up data replication chains in which content is transferred consecutively across multiple systems (e.g., system A replicates to system B, and then system B replicated to system C).

As a business case example, setting up a data replication chain may be required for test or acceptance systems, where data or design changes are tested or approved before they go live.

Note

It is not supported to create a replication ring, i.e., the last target system in a replication chain serves as source system to replicate data back to the orininal source system (e.g., system A to system B, then system B to system C, and then system C to system A).

The following figure depicts a data replication chain with 3 stages in a simplified form. For easier understanding it shows only target systems instead of target clusters, since only one system in a target cluster can handle as an editing system for the next stage in a replication chain.

Figure: Mass Data Replication: Simplified schema of a data replication chain.

When setting up data replication chains, take care of the following topics:

  1. All systems serving as source systems have to set the property staging.system.type to value editing(e.g., system A and system B in a 3-tier chain).
  2. Only the final target system (resp. all target systems of the final target cluster) have to set the property staging.system.type to value live(e.g., system C).
  3. It is necessary to set up a database connections from each target system to each precessing source system in the chain (e.g., from system C to system B, and from system B to system A). It does no matter if a database link, direct access or both is used in the chain.
  4. DBInit has to be executed in the first source system (A). Then a database dump has to be exported in the first source system and has to be imported into every target system (B and C in the example).
  5. Filesystems of all target systems need to be up-to-date (e.g., the sites directory).

7 Error Detection, Handling and Recovery

7.1 Basics

Replication processes are intended to be an atomic operation, i.e., they are counted as successfully finished only, if they

  • have been finished successfully in each target system of the target cluster
  • have been finished successfully for each phase of the process (depending on the type of process preparation, synchronization, replication, publication, cache refresh)
  • have been finished successfully for each table resp. each file / directory which are part of any replication group / staging group involved in the process

Therefore, when- and where ever an error occurs during a replication process, the whole replication / staging process is broken up and signed as failed.

7.2 Error Tracking / Error Detection

By default, status / errors of replication and staging processes are written into the PROCESS table. The status of a replication process is displayed in the back office (SLDSystem -> Data Replication -> Replikation Processes -> Process detail page).

Additionally, errors in replication and staging processes are tracked in error*.log files in share/system/log. Status information and errors within the staging framework are tracked in staging*.log files in the same directory.

Out-of-the-box, there is no additional error notification implemented for data replication, but there is a standard mechanism to call a custom pipeline at special stages of a staging process, which can be used to implement and call a custom notification pipeline at the end of a staging process (StagingProcessCustomization -> OnPreCompletition).

7.2.1 Error Logging for Replication Processes Started via Back Office

7.2.1.1 Log Files

All information related to a staging process (pipelines TriggerStagingProcess, TriggerPublicationProcess, TriggerUndoProcess in editing, and StagingProcess in target system) is written to staging*.log files in share/system/log (not only errors, but errors, too).

Errors occuring in the replication process (pipeline TriggerReplicationProcess) will be tracked in error*.log files.

7.2.1.2 Database Stored Process States

Replication processes, staging processes and their staging sub processes store their process states in the PROCESS table. In case of a failed or blocking replications sometimes in may be helpful to check the respective process states in the database. Since PROCESS rows contain a LASTMODIFIED column, the most recent replication resp. staging process can easyly determined by ordering the rows by LASTMODIFIED.

The following table gives an overview of the occuring process states. To check the most recent process for its process state, execute the following SQL command (replace <ProcessName> by the respective name from the table below):

select state from PROCESS where name=<ProcessName> order by LASTMODIFIED desc;

Meaning of table columns:

Column

Meaning

Process type

Process type, where a process state does occur (replication, staging, or staging sub process).

Process name

Name of the process type as it occurs in the PROCESS table.

Process state

Name of the process state as it occurs in the PROCESS table.

State type

Defines, if the described state is set in the middle of a running staging process (type process),
or if state is set at the end of a staging process (type final).

System

Shows, where the described state occurs (source, or target system). \\\
States marked with source* are set in source system, but observed by the target systems. They are used to synchronize the target systems (publication, cache refresh).
States marked with target* are set in target system, but in the corresponding sub-process in the source system, too. The source system observes all sub-processes for synchronization purposes. For non-error states, the source system counts the aggregated target state as set, if each of the target systems has reached the according state. For error states, the source system counts the staging process as failed, if at least one target system has set an error state; in such case all other target systems will end their staging process, too, setting it as failed.

Process type

Process name

Process state

State type

System

Description.
Possible error cause.

Replication process

'ReplicationProcess'

WAITING

process

source

The replication process is prepared, but the execution time is not yet reached.

 

 

CANCELED

final

source

The replication process was canceled in Back Office.

 

 

RUNNING

process

source

The replication process is underway.

 

 

COMPLETED

final

source

The replication process has successfully finished.

 

 

FAILED

final

source

The replication process has finished due to errors.

 

 

 

 

 

 

Staging process

'StagingProcess'

ErrorInternal

final

source

Any severe failure when calling the source system's staging pipeline.
See staging.log, too.
Maybe missing or wrong settings in pipeline directory or other parameters. Should normally not occur during system operation.

 

 

ErrorExecutingEditingStagingPipeline

final

source

Staging pipeline in editing system cannot be executed.
Possibly app server error in source system.

 

 

ErrorNonStagedDomains

final

source

Some replication content references at least one domain,
that is not part of the current replication process nor exists at least in one target system.
Check the replication tasks / groups of current replication process. See also staging.log.

 

 

ErrorNonStagedParentSites

final

source

Some replication content belongs to at least one unit,
whose parent site is not part of the current replication process nor exists at least in one target system.
Check the replication tasks / groups of current replication process.

 

 

ErrorConnectToEditingDB

final

source

The source system cannot create the staging identification token.
Check DB connectivity.

 

 

ErrorConnectLiveSystem

final

source

The source system's staging web service cannot connect to a target system.
Check web reachability of the target system(s).

 

 

ErrorCreatingLiveStagingProcess

final

source

Failure when copying the staging process to (at least) one target system.
See also staging.log
Check web reachability of target systems, check DB in target systems.

 

 

ErrorAcquiringLiveLocks

final

source

Failure when acquiring the locks for staging resources in (at least) one target system.
See also staging.log
Check DB / locks in target systems.

 

 

ErrorAcquiringEditingLocks

final

source

Failure / timeout when acquiring the locks for staging resources in source system.
See also staging.log
Check DB / locks in source system.

 

 

ErrorInitializingStagingProcessors

final

source

Failure when checking the assignments of staging processors for all staging groups.
Check staging processor assignments in staging groups and staging.properties settings.

 

 

ErrorStagingProcessModeNotSupported

final

source

(At least) one staging processor does not support the current replication process type, i.e., the staging process mode.
See staging.log, too.

 

 

StartingPreparation

process

source

The preparation phase is starting.

 

 

PreparationSuccessfullyFinished

process

source

The preparation phase finished successfully.

 

 

ErrorPreparation

final

source

The preparation phase finished with an error.
See log files for more information.

 

 

FatalErrorPreparation

final

source

Fatal error during error handling in preparation phase.
See log files for more information. Check DB accessibility.

 

 

ErrorCallingLivePipeline

final

source

An error occured while the source system called the staging pipeline in a target system.
Check the log files, and check the web reachability of the according target system.

 

 

StartingSynchronization

process

target

The synchronization phase is starting.

 

 

SynchronizationSuccessfullyFinished

process

target

The synchronization phase finished successfully.

 

 

ErrorSynchronization

final

target *

The synchronization phase finished with an error.
See log files for more information.

 

 

FatalErrorSynchronization

final

target *

Fatal error during error handling in synchronization phase.
See log files for more information. Check DB accessibility.

 

 

StartingReplication

process

target

The replication phase is starting.

 

 

ReplicationSuccessfullyFinished

process

target

The replication phase finished successfully.

 

 

ErrorReplication

final

target *

The replication phase finished with an error.
See log files for more information.

 

 

FatalErrorReplication

final

target *

Fatal error during error handling in replication phase.
See log files for more information. Check DB accessibility.

 

 

ReplicationProcessCompleted

final

target *

A staging process of type Replication has successfully finished.

 

 

StartPublication

process

source *

This is a state used to get the target systems in sync before the publication phase can start.

 

 

StartingPublication

process

target

The publication phase is starting.

 

 

PublicationSuccessfullyFinished

process

target

The publication phase finished successfully.

 

 

ErrorPublication

final

target *

The publication phase finished with an error.
See log files for more information.

 

 

FatalErrorPublication

final

target *

Fatal error during error handling in publication phase.
See log files for more information. Check DB accessibility.

 

 

StartRefreshCache

process

source *

This is a state used to get the target systems in sync before the refresh_cache phase can start.

 

 

StartingRefreshCache

process

target

The refresh_cache phase is starting.

 

 

RefreshCacheSuccessfullyFinished

process

target

The refresh_cache phase finished successfully.

 

 

ErrorRefreshCache

final

target *

The refresh_cache phase finished with an error.
See log files for more information.

 

 

FatalErrorRefreshCache

final

target *

Fatal error during error handling in refresh_cache phase.
See log files for more information. Check DB accessibility.

 

 

StagingProcessCompleted

final

target *

A staging process of type ReplicationPublication or of type Publication has successfully finished.

 

 

ErrorDeterminingUndoContent

final

source

An error occurred while determining the undo content.
See staging.log file for more information.

 

 

StartingSaveNoneUndoContent

process

target

The sub-step SaveNoneUndoContent of undo phase is starting.

 

 

SaveNoneUndoContentSuccessfullyFinished

process

target

The sub-step SaveNoneUndoContent of undo phase finished successfully.

 

 

ErrorSaveNoneUndoContent

final

target *

The sub-step SaveNoneUndoContent of undo phase finished with an error.
See log files for more information.

 

 

FatalErrorSaveNoneUndoContent

final

target *

Fatal error during error handling in sub-step SaveNoneUndoContent of undo phase.
See log files for more information. Check DB accessibility.

 

 

StartingRestoreUndoContent

process

target

The sub-step RestoreUndoContent of undo phase is starting.

 

 

RestoreUndoContentSuccessfullyFinished

process

target

The sub-step RestoreUndoContent of undo phase finished successfully.

 

 

ErrorRestoreUndoContent

final

target *

The sub-step RestoreUndoContent of undo phase finished with an error.
See log files for more information.

 

 

FatalErrorRestoreUndoContent

final

target *

Fatal error during error handling in sub-step RestoreUndoContent of undo phase.
See log files for more information. Check DB accessibility.

 

 

StagingUndoCompleted

final

target *

A staging process of type UnDo has successfully finished.

 

 

ErrorUndoStaging

final

target *

A staging process of type UnDo has finished with error(s).
See log files for more information.

 

 

ErrorInternalInLiveSystem

final

target *

Any severe failure when calling the target system's staging pipeline.
See staging.log, too.
Maybe missing or wrong settings in pipeline directory or other parameters. Should normally not occur during system operation.

 

 

ErrorEditingStagingProcessKilled

final

source

At its start-up time INTERSHOP 7 checks the PROCESS table, if there is a staging process with any non-final state (this process would be broken due to shutdown or crash of the appservers). If so, this process is set to ErrorEditingStagingProcessKilled in source system.

 

 

ErrorLiveStagingProcessKilled

final

target *

At its start-up time INTERSHOP 7 checks the PROCESS table, if there is a staging process with any non-final state (this process would be broken due to shutdown or crash of the appservers). If so, this process is set to ErrorLiveStagingProcessKilled in target system.

7.2.2 Error Logging for Replication Processes Started via Jobs

There is a job*.log file in share/system/log, but it would normally only inform, wether a pipeline was successfully, i.e., without technical failure, executed, or not. For replication processes started by jobs, the more diagnostic information can also be found in error*.log (replication level) and staging*.log (staging level) resp. - if enabled - the debug*.log files.

7.3 Error Handling / Recovery

In case an error occurs during a replication process, both editing and live system(s) will then keep the active data as they were active before the now broken replication process. In this sense, a data recovery is not needed if a replication process showed up an error.

However, there is a situation, where a manual intervention might be needed: in case the INTERSHOP 7 application server, that executes the replication process in a target system is just crashed in that moment when it is performing the synonym switches, it might be, that synonyms point to the newly filled table, while this information is still not written to the database table STAGINGTABLE, which is used as an administration table for staging.

If such a situation occurs, open a SQL prompt as the target system's database user and execute the procedure staging.restore_synonyms

exec staging.restore_synonyms

7.4 Possible Causes of Errors

  • Malfunctioning web connection between source and target system
    The source system uses the web connection that is configured in replication-clusters.xml to inform the target systems about new replication processes.
    Check as the according operation system user in the source system, e.g., isas1, if you can access the target system web address as configured. You may use " telnet <webserver> <port>" or suchlike.
  • Malfunctioning web connection between target and source system
    The target system needs the web connection to download file content. While starting a replication process, the source system transmits its own web address as defined in appserver.properties or - if defined - in staging.properties.
    Check as the according operation system user in the target system, e.g., isas1, if you can access the source system web address as configured. You may use " telnet <webserver> <port>" or suchlike.
  • Database connection fault (wrong DBLink configuration or broken connection) resp. database access forbidden from target to editing database schema
    The target system's database user needs access to the source system's database schema to transfer database content.
    Check as the according operation system user in the target system, e.g., isas1, if you can connect to the target system's database schema using SQLPlus and the credentials as defined in orm.properties. Check, if you can access source system data, e.g., by

    select using DBLink
    select count(1) from product@<source_dblink_name>;
    


    or

    select using direct schema access
    select count(1) from <source_schema_name>.product;
    
  • Errors due to database issues (e.g., ORA-errors like constraint violations, not existing tables, table changed but $s view not updated...)
    Avoid manual copying of data! Replication can do that for you.
    Beware of parallel manual creation of organizations, channels, catalogs in source and target systems! Replication can do that for you.
    Use the DBMigrate preparers as described before or in the Cookbook section to change tables or to create new tables. Pay attention to predefine UUIDs and to execute the DBMigrate preparers both in source and target system(s) with the identical configuration files.
  • Errors due to file system issues (access rights, disk space, quotas)
    Check if the target system(s) provide enough disk space for index files and file downloads (path share/dist). Check access rights for the file system. Check, if there are any active quotas that may limit file transfer (file system, web download, ..).
  • Database crash during publication (synonym switch)
    See above for restoration of synonyms.

8 Appendix

8.1 Multi Data Center Support

The Data Replication functionality is capable to support multiple data centers.

The basic concepts of multiple data center support in INTERSHOP 7 are described in detail in separate articles. In short, they assume the following conditions:

  • In general, each data center hosts a complete installation of an INTERSHOP 7 cluster.
  • Each INTERSHOP 7 cluster consists of the following components:
    1. An Oracle 11gR2 RAC database installation in each data center
    2. A number of INTERSHOP 7 application servers.
    3. A number of Apache / INTERSHOP Web-Adapter installations.
  • Oracle Streams is used to synchronize the Databases between the data centers. Oracle Streams is setup to synchronize certain tables in INTERSHOP 7 (transactional data).
  • INTERSHOP 7 clusters in distributed data centers work in an active-active scenario.
    1. The INTERSHOP 7 clusters are synchronized with each other to support session fail-over between the data centers.
    2. Each INTERSHOP 7 cluster may host several sites, whereby a site is active only in one data center.
  • Each INTERSHOP 7 cluster is able to work standalone if other data center(s) are inaccessible.

Regarding data replication environments, multi data center support means additionally:

  • In each data center a complete data replication environment consisting of source and target system is installed.
  • Target systems, hosting the same data (i.e., the same sites), form a target cluster. Though owning different web addresses and database schemata, they share an identical cluster ID.
  • Source systems, hosting the same source data (i.e., the same sites), form a source cluster. Though owning different web addresses and database schemata, they share an identical cluster ID (but apart from the target cluster's cluster ID).
  • Live systems (hosting the public storefront):
    1. All live systems are active in all data centers.
    2. Their transactional data are synchronized by Oracle Streams.
  • Editing systems:
    1. Only one editing system is active, all other editing systems are inactive (i.e., the appservers are shut down).
    2. All their database data are synchronized by Oracle Streams; file system data is synchronized by rsync or equivalent mechanisms.

As already described before, INTERSHOP 7 introduced the concept of target clusters, which allows to update multiple target systems (possibly in different data centers) quasi in parallel with a single data replication process (both mass data, and business object replications). Thus, all target systems of a target cluster will be updated with the same database and file system data.

Especially for Data Replication, the only needed configuration file to support multi data center usage is the replication-clusters.xml in <IS_HOME>/share/system/config/cluster. It's content and syntax was already described before.

To ease the setup and deployment of distributed data replication configurations, it is possible to add the data center name as defined by IS_DATA_CENTER in intershop.properties as a prefix to replication-clusters.xml. This way, one can distribute the data center specific replication-clusters.xml files independently to all source systems.

When looking up for replication-clusters.xml, INTERSHOP 7 will first check, whether IS_DATA_CENTER is set and - if so - whether there is a data center specific <$IS_DATA_CENTER>_replication_clusters.xml. If present, the system will use it. If not found, the system will look for the default name replication-clusters.xml.

Example: Assuming the name of a data center is "DC_THX1138", then in intershop.properties the variable IS_DATA_CENTER would be set to

Excerpt from intershop.properties
IS_DATA_CENTER=DC_THX1138

A data center specific replication-clusters.xml would then be looked for as DC_THX1138_replication-clusters.xml.

8.2 Dependencies Between Replication Groups

Basically, the data replication mechanism requires source and target system to have the same structure in the database (tables, indexes, ...) and the same base content (system domain, root site, ...).

8.2.1 Organizations and Channels

Mass Data Replication supports the transfer of new organizations and channels, which were created in the editing system, to the target system. However, the content (catalogs, products, ..) of an organization or channel can only be transfered if the respective structure (domain hierarchy, i.e., the organization resp. channel themselves) has been replicated before or is involved in same replication process. That's why, after creating a new organization or channel in the editing system it is suggested, first to create a replication task in the organization / enterprise (e.g., PrimeTech) with replication groups Organization and Channels/MasterRepository and have it replicated to the target system. Subsequently then replicate the other data like catalogs, products etc.

For example, assume a partner organization Miller working in the partner channel Reseller Channel of the sales organization "PrimeTech". In case the organization Miller wants to replicate master repository data, then Miller, Reseller Channel and PrimeTech also have to exist on the target system.

Note

It is not possible to replicate content into the repository of a different organization, or an organization working in a different channel.

Note

Do not create a new organization or channel manually in the target system, which already exists in the editing system, if you want to use data replication to update data of the target organization / channel with data from the editing system. Use data replication instead to transfer the organization / channel from editing to target system! Though the displayed name of an organization / channel is the same in editing and target system, their DOMAINID will differ, if not transferred by mass data replication, and so the data replication will count both organizations resp. channels as different ones, i.e., data replication will not work between them, since data replication depends on the DOMAINID.

8.2.2 Catalogs

If a channel uses catalogs that are shared from a superior organization (enterprise or partner organization), then changes of catalog data in the superior organization require to be replicated before or together with the (derived) data of the channel catalog to be available in the target system (replication group Catalogs, categories and product types in both the superior organization ans the channel). If only the channel catalog is replicated, changes in the organization's catalog will not be replicated automatically and will be missed in the target system.

8.2.3 Products

If a channel uses products that are shared from a superior organization (i.e., the master repository in the enterprise or partner organization), then changes of product data in the master repository require to be replicated before or together with the (derived) data of the channel products to be available in the target system (replication group Products in both the superior organization and the channel). If only the channel products are replicated, changes in the master repository will not be replicated automatically with mass data replication and will be missed in the target system.

8.2.4 Product Prices

As in the organization (i.e., the master repository) as in the channel, product prices are replicated implicitly with replication group Products, but can also be replicated explicitly with replication group Product Prices.
Like with Products, Product Prices of the channel that are shared from a superior organization, require the replication of Product prices of the organization, too, if the prices were changed in the organization.

8.2.5 Image Definitions

Image definitions (types, views, sets and the relations between them) exist only on organization level and are referenced from organization level in the organization and in channels. Therefore, changes of image definitions need to be replicated in the organization (replication group Image Definitions).

8.2.6 Image References

Image references use image definitions, which are only maintained in the organization. So, when image definitions were changed, they have to be replicated at organization level.
Image references themselves can be considered to be references to product pictures. They can exist like products at organization and channel level.
Image references for products in an organization (i.e., in the master repository) have to be replicated in the organization (replication group Image References), image references of channel products have to be replicated within the channels.

8.2.7 Localization Data

Localization data can be maintained on organization and on channel level. Data is stored separately in an according localization repository for each level. The localization functionality of INTERSHOP 7 uses a lookup mechanism, that first searches in the current channel's localization repository and then in the superior organization(s)'s localization repository. So, if localization data is modified on organization level and on channel level, then the localization data has to be replicated in the organization and in the channel (replication group Localization Data).

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Customer Support
Knowledge Base
Product Resources
Support Tickets