Introduction

This document is about mass data replication (staging) of Solr search indexes for ICM staging environments. Solr indexes generally reside in the file system of the Solr server and reflect data that is indexed from the ICM database. There are no search index-related artifacts in the database. When replicating database content using ICM mass data replication, it is also necessary to transfer the file system content of the index from the source system (edit system) to the target system (live system).

Glossary

Term	Description
ICM	Intershop Commerce Management
core	A lucene index on Apache Solr side
collection	A lucene index on Apache Solr side (cloud mode) It may consist of multiple shards and replicas which in turn are cores itself.

References

For general information about data replication, see Concept - Mass Data Replication.

To use Solr Cloud replication, please mind configuration keys described in the Configuration of ICM section of Guide - Deployment Solr Cloud Server.

For ICM 11+ configuration, environment variables can be used, see Concept - Configuration.

Replication Process

Generic Search Index Replication

The search index framework provides the following common, search engine-neutral artifacts to help replicate search indexes.

Staging Group: FND_SearchIndexes, a staging group that uses a standard file system staging processor named SearchIndexesStagingProcessor to transfer file system content that is located in the indexes folder of the shared file system. Typically, this folder contains the common generic index configuration stored in an ISH-Config.xml file.
Staging Decorator: RefreshSearchIndexesDecorator is the standard staging decorator included in the bc_search cartridge. It processes indexes that are replicated in the file system and calls a reload after switching active directories.
Data Replication Group: SEARCH_INDEXES, data replication group for the Commerce Management application that maps directly to the search indexes staging group.
Properties: staging.properties, a generic file in /share/system/config/cluster that contains properties specific to the ICM data replication feature.

Solr Cloud-specific Replication

The index data itself resides in the Solr server, completely independent of the ICM instance. As a result, we need to transfer the Solr search indexes from one Solr (edit) instance to another Solr (live) instance. This is done using the Solr APIs for backing up and restoring collections. Even if the ICM source (edit) and target (live) systems share a single Solr cluster, indexes must be backed up and restored during replication. This process is done using a dedicated staging decorator included in the ac_solr_cloud cartridge. This BackupAndRestoreSolrCloudIndexesDecorator decorator is added to the search indexes staging processor during appserver startup via Guice binding.

The decorator executes the Solr-specific logic in the replication phase hook methods that is needed to transfer the indexes during replication. Before the preparation phase (onPrePreparationHook), the decorator determines all the indexes that are contained in the domains that are processed by the current replication process. For all the determined Solr indexes, the number of documents in the collection is determined and a backup is triggered via the Solr Collection Admin API. The backups are stored in a specific backup location, which is determined by the solr.cloudBackupLocation configuration key, see Configuration of ICM.

These index backups are stored at the backup location in directories named according to the following naming scheme: <staging-process-id>-<domain>-<indexID>.

In the target (live) system, the currently available indexes are determined in the pre-publication phase (onPrePublicationHook). After the publication and replication phase (onPostReplicationHook), the indexes are restored from the backups into separate collections named <solr.clusterIndexPrefix><domain-name>-<indexID><solr.collectionSuffix1> or solr.collectionSuffix2 respectively. To be able to refer to the currently used index, there is an alias that points to the current live collection. The backup is restored to the collection suffix that is not currently referenced by the alias. The Solr API for restoring collections requires that the existing collection with that suffix must be deleted before the restore. After deleting and restoring the collection and config set, the collection is reloaded to ensure that any configuration changes in the restored config set are reloaded. A query is then executed to verify that the collection contains the same number of documents as the corresponding edit collection. Finally, the refresh phase (onRefreshCacheHook) switches the current aliases from the currently active collection to the newly restored collection.

Backup and restore operations are performed by submitting the backup or restore operation to the Solr server via asynchronous requests. Waiting for these operations to complete has a separate timeout setting: solr.backupRestoreTimeout

Undoing a successfully completed replication is supported by switching the aliases back to the previous collection. Note that indexes created on the live ICM system are deleted during index replication. In an ICM replication system, Intershop strongly recommends to perform all index operations such as creating, changing the configuration, and deleting indexes only in the editing system.

Above is a simple (development) setup with a single Solr server that is used by live and edit instances. The solr.cloudSolrServerURLs property points the ICM application servers to the same Solr server instance. The solr.clusterIndexPrefix distinguishes between the collections of the edit and live instances, and the solr.cloudBackupLocation points to an existing directory on the same Solr server.

Note

A single Solr server or cluster may even be shared between multiple ICM instances by setting the solr.clusterIndexPrefix configuration to a name that distinguishes them from each other. It defaults to a simple host name followed by the instance ID of the ICM installation.

The next image shows a setup that uses a single Solr server for the edit system and a Solr cloud cluster for the live system. The backup directory is shared by a network file system between the edit server and all the live servers.

Note

All Solr servers in the Solr cluster(s) for the source (edit) and target (live) system need a common shared file system where the backups are stored, written by any Solr server in the Solr cluster for the source (edit) system and read by any Solr server in the Solr cluster for the target (live) system. This shared file system has nothing to do with the IS7 shared file system, instead it is for the Solr servers only.

The solr.replicationFactor on the live system is set to 2 (maximum is the number of currently running Solr server nodes. It is not recommended to set it to that number to allow rolling restarts). This property is used when restoring the collection to the live system to distribute replicas of the index to ensure load balancing and availability, even if one of the Solr server nodes goes down. The ICM application servers communicate with the common service endpoint listed in solr.cloudSolrServerURLs, which distributes the Solr API requests to the available Solr server nodes.

Note

Be careful not to run out of space in the backup directory. In the Kubernetes deployments, a cron job deletes backups from replication processes that are older than 7 days (default)

If the Solr backup directory is also shared to the application servers and is accessible at the same location (solr.cloudBackupLocation), the data replication process will perform additional checks to verify the successful execution of the backup.

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Table of Contents

Info

Introduction

Glossary

References

Replication Process

Generic Search Index Replication

Solr Cloud-specific Replication

Note

Note

Note