Concept - Intershop Commerce Platform Synchronization Process

1 Introduction

This concept describes how the synchronization between different environments works. A nearly identical data set on different environments allows for example to run tests under (nearly) production conditions using a lower tier test environment. The sync process can also help to restore the database or sync it from another environment.

After the sync, a pseudonymization process ensures that no sensitive personalized data is stored in local environments.

1.1 References

1.2 Glossary

TermDescription
DBDatabase
DEVDevelopment Team
INTIntegration environment
PRDProduction environment
SFSShared file system
UATUser acceptance test environment

2 Synchronization Process

In contrast to the replication process, which takes place between live and edit ICM clusters, the synchronization process can take place between the edit or live cluster of two environments, for example from PRD edit cluster to UAT or INT edit cluster. 

The sync always occurs from a "higher" to a "lower" environment. 

Running the sync process is done manually, as this has to be agreed between all parties. An automatic execution (e.g. 1x per week) is also conceivable, but must be individually coordinated for each project.


The synchronization consists of two processes:

  • Shared File System Synchronization
    This allows to synchronize the shared file systems of each environment. During this process, shared file system data or files like images and general assets are copied from the PRD file system to the UAT file system. This is considered optional since for example missing images do not break general functionality or change semantic correctness of implemented or newly introduced features.
    However, it may still be useful for testing, as the duration of data replication from EDIT to LIVE may give an indication of behavior and duration of replication for the PRD environment.
  • Database Synchronization
    This allows to backup and restore the database in the desired environment. The goal is to have the same customer data, products, service configurations etc. on the UAT/INT environment as on the PRD environment.

This can be done by the customer for INT and UAT environments, not for PRD.

Duration

The time required for the synchronization can vary greatly depending on the data stock. A large number of images in particular will increase the time required for SFS synchronization. A large number of products will increase the time required for database synchronization. Also note that the initial synchronization takes longer than subsequent synchronization processes. Typical duration is between 20 minutes and 1 hour.

Synchronization after deployment

Automatic synchronization after a deployment on production is conceivable. However, as the target environment would be unavailable after each deployment, it is not advisable.

Search indexes

SOLR search indexes are not part of the synchronization.

3 Configuration

The configuration of the two synchronization processes can be done in Jenkins.

3.1 Shared File System Synchronization

The shared file system synchronization can be done in Jenkins via ICM Shared Filesystem Sync.

Below Build with Parameters, the following parameters are available:

  • Sync direction (SYNC_DIRECTION)
    For example from PRD to INT
  • Sync directories (SYNC_DIRS)
    Allows to define the scope of the sync (sites or also system/config/domains)
  • Do Sync (DO_SYNC)
    Define whether an actual sync or just a dry run should be done.
  • Delete Files From Target (DO_DELETE)
    Define if files from target not present on the source should be deleted. 

Application server properties as far as included in share/system/config/domains can be synchronized between the environments. By default, this possibility is deactivated as it requires a good understanding of the application properties.

The shared file system synchronization job relies on rsync software.

3.2 Database Synchronization

The database synchronization can be subdivided in two main tasks: creating a database backup and restoring it. 

3.2.1 Database Backup

The database backup mechanism depends on the type of database used. 

In case of Oracle and MS SQL self-managed databases, a dump is exported from the source environment. The database backup is scheduled to run automatically and regularly on database level, for example to occur every night or during the lowest frequented time of the day. It can also be triggered manually in Jenkins.

Therefore, switch to the section ICM DB MSSQL Backup and click on Build with Parameters.

For MS SQL managed instance, no dump is used as point-in-time recovery is available.

3.2.2 Database Restore

The database backup can be restored on the target environment using the Jenkins job ICM DB MSSQL Restore in the case of Oracle and MS SQL self-managed databases, or ICM DB MSSQL PointInTime Restore in the case of MS SQL managed instance. 

For example, a backup from the UAT edit database can be restored to the INT edit database.

The restoration is done in 5 steps:

  1. Monitoring downtime
    This will shut down the monitoring system, so there are no alerts regarding the following shutdown of the application server nodes. 
  2. Stop application server nodes
  3. GIT project checkout
    A git configuration project including scripts that are used for restoring is checked out. 
  4. MSSQL database restore
    The actual restore happens at this point. When restoring from a PRD database, pseudonymization scripts will be run afterwards. This will disguise all sensitive data. The pseudonymization process is described in the next section.
  5. Startup application server nodes

UUID

The database synchronization preserves the UUIDs.

Replication

Index creation can be triggered immediately after database synchronization.

The staging framework depends on the identical structure of the tables to be replicated. It means that replication can be performed after database synchronization if the edit and live clusters of one environment still have an identical structure. This is not generally the case. By performing the database synchronization on both edit and live clusters, this condition is met.

3.2.3
Pseudonymization Process

3.2.3.1 Motivation

Pseudonymization is required by the data protection law. E-mails, logins etc. of real customers must not be available on UAT or INT.

3.2.3.2 Basis

The pseudonymization is based on a (SQL-)script. To be able to execute this script, the necessary preparations must be made or prerequisites created. Apart from declaring and initializing the variables, the script checks whether it is executed in the correct environment. If this is not the case, it aborts the pseudonymization with an error message.

Preparations include the creation of temporary tables that record which columns in the selected tables should be pseudonymized. To exclude anomalies, the existing restrictions and foreign key relationships are deactivated and the corresponding tables are emptied. After the tables have been emptied for temporary storage, the existing restrictions and foreign key relationships are restored.

The table assignments define which tables are to be pseudonymized with the respective columns. You can explicitly define a filter for each table. A filter restricts the rows to be pseudonymized. This ensures that certain rows, such as entries (test users) that you want to keep for test purposes, are optionally not pseudonymized.

The data to be protected is replaced accordingly by generated random values. The procedure is applied iteratively, i.e. until all data is encrypted.

3.2.3.3 Security

As the anonymization is part of the import process, no non-anonymized data will be present on the target system. Hence, there is no risk of developers or users from partner or customer side accessing non-anonymized data. 

3.2.3.4 Difference Between Anonymization and Pseudonymization

Anonymization is the alteration of personal data in such a way that these data can no longer be assigned to a person. In Pseudonymization, the name or another identifying feature is replaced by a pseudonym (usually a combination of letters or numbers with several digits, also known as a code) in order to exclude or make it considerably more difficult to establish the identity of the person concerned (see section 3 (6a) BDSG or corresponding national law).

In contrast to anonymization, pseudonymization preserves references to different data records that have been pseudonymized in the same way.

Pseudonymization thus makes it possible to assign data to a person with the aid of a key, which would not be possible or would be difficult to do without this key, since data and identification features are separate. The decisive factor is therefore that it is still possible to combine person and data. On the other hand, it is not significantly more difficult to establish identity if only initials and date of birth are used as identifiers.

The more meaningful the collection of data is (e.g. income, medical history, place of residence, height), the greater the theoretical possibility of assigning it to a specific person and identifying him or her even without a code. To maintain anonymity, these data may need to be separated or falsified to make it more difficult to establish identity.

3.3 Anonymized Fields by Default

For a list of fields that OPS considers for the anonymization or deletion, refer to the following PDF:

anonymized-fields-by-default.pdf [49 KB]

3.3.1 Exclusion from the Pseudonymization

User accounts, for instance for Intershop Commerce Management, can be excluded from the process. Accounts that should not be pseudonymized (e.g. for test/QA) and related data need to be communicated/agreed with the operations team – allowing a set of specific “whitelisted” accounts to still work on UAT after synchronization from PRD.

Note

Azure Active Directory users are not saved in the database, the synchronization has no influence on this type of account.

The fields included in the pseudonymization can be freely defined.

However:

  • Tables cannot be excluded. The database synchronization process copies the database as a whole. Theoretically, it is possible to save one table separately and restore it afterwards. Details should be clarified in this case.
  • Configuration cannot be excluded. For a synchronization from PRD to UAT, UAT receives the configuration of PRD first.

3.3.2 Configuration

As the configuration cannot be excluded from the process, the original configuration of the target environment should be restored afterwards. 

Typical items of the configuration that differ between PRD and UAT or INT:

  • Job configuration
  • Backend services, including payment services
  • SMTP services

If the configuration is not restored, UAT could communicate with PRD backends and possibly perform actions that are not intended for UAT.

To restore the configuration:

  1. The development team provides a (declarative) configuration set, automatically injected on application server node startup. It is recommended to maintain the configuration for the testing environments as resources in the source code repository.
  2. This configuration set must depend on the environment. The environment is checked by the combination of the environment and the staging system type (e.g. preproduction_live).
  3. The configuration on server startup is handled by the ICM configuration framework. An example implementation, called Configuration Solution Kit, which extends the configuration framework with the capabilities required for this use case, is available for project use. See Cookbook - 7.10 Test System Configuration Solution Kit (please take into account the disclaimer in the corresponding article).
  4. This would lead to a system with reconfigured service configurations.
  5. The configuration set for testing systems must be reviewed constantly and maintained if new channels and/or configurations are introduced to match intended configuration for the environments.

3.3.3 Responsibilities

The operations team takes care for setting up the synchronization process, including the customer data pseudonymization, and has to adapt it on DEV request.

The development team is responsible for the data that are pseudonymized and for communicating changes, adoptions and extensions related to the pseudonymization process. The development team is also responsible for the correct configuration of the target environment, especially:

  • Ensuring that PRD configurations, e.g. regarding PayPal account data, are not identical to those on the target environment,
  • Ensuring that the job configuration on the target environment is correct. 

Eventually, the development team is responsible for the triggering of the synchronization process.

4 How to Change the Synchronization Process?

All changes in the process must be performed by the Intershop operation team. The development team can request them through the opening of a service desk ticket.

5 Frequency

The decision on how often the synchronization is done is determined by the customer. Regular synchronization is advisable. Intershop recommends at least one synchronization at the end of each PRD deployment.

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Customer Support
Knowledge Base
Product Resources
Tickets