Introduction

This document is about mass data replication (staging) of Solr search indexes for ICM staging environments. Solr indexes in general reside somewhere in a file system and reflect data that is indexed from the ICM database. There are no search index related artifacts in the database. When replicating database content with ICM mass data replication it is necessary to also transfer the index's file system content from the source system (Edit system) to target system (Live system).

This document covers Solr standalone as well as Solr cloud.

Glossary

Term	Description
ICM	Intershop Commerce Management
core	A lucene index on Apache Solr (standalone) side
collection	A lucene index on Apache Solr side (cloud mode) It may consist of multiple shards and replicas which in turn are cores itself.

References

For general information about data replication, see Concept - Mass Data Replication.

To use Solr Cloud replication, please mind configuration keys described in the Configuration of IS7 section of Guide - Deployment Solr Cloud Server.

Replication Process

Generic Search Index Replication

The search index framework provides the following common search engine neutral artifacts to help with the replication of search indexes.

Staging Group: FND_SearchIndexes, a staging group that uses the file system staging processor to transfer file system content that is located in the searchindexes folder of the shared file system.

Staging Decorator: RefreshSearchIndexesDecorator is the standard staging decorator included in the bc_search cartridge. It processes indexes that are replicated in the file system and calls a reload after switching active directories.
Data Replication Group: SEARCH_INDEXES, data replication group for the Commerce Management application that maps directly to the search indexes staging group.
Properties: staging.properties, a generic file in /share/system/config/cluster that contains properties specific to the ICM data replication feature.

Solr Cloud-specific Replication

This chapter describes the replication as with component set f_solrcloud.

When using Solr cloud mode, the index data itself is contained in a Solr cluster, completely independent of the ICM instance. In this respect, it is no different from any relational database system. As a result, we need to transfer the Solr search indexes from one Solr (edit) instance to another Solr (live) instance. This is done using the Solr 6 backup and restore feature. Even if the ICM source (edit) and target (live) systems share a single Solr cluster, indexes must be backed up and restored during replication. This is done using a dedicated staging decorator included in the ac_solr_cloud cartridge.

Deploying the f_solrcloud component set changes the staging.properties to use the BackupAndRestoreSolrCloudIndexesDecorator as decorator for the search index staging processor.

staging.processor.SearchIndexesStagingProcessor.decorator.0=com.intershop.adapter.search_solr.server.staging.BackupAndRestoreSolrCloudIndexesDecorator

Before the preparation phase (onPrePreparationHook), the decorator determines all the indexes that are contained in the domains that are being processed by the current staging process. For all determined Solr indexes, a backup is triggered via Solr Collection Admin API. The backups are stored in a specific backup location, which is determined by the solr.cloudBackupLocation configuration key, see Configuration of IS7.

These index backups are stored at the backup location in directories named according to the following naming scheme: <staging-process-id>-<domain>-<indexID>

In the target (live) system, at the refresh cache phase of replication (onPreRefreshCacheHook), the indexes from the backups are restored to separate collections named: <solr.clusterIndexPrefix><domain-name>-<indexID><solr.collectionSuffix1> or solr.collectionSuffix2 respectively. There is an alias pointing to the current live collection. The backup will be restored to the collection suffix that is not currently referenced by the alias.

Restoring the backed up index will also restore the config set, but it will be named like the config set of the source (edit) index. This is perfectly normal as the backup contains the reference to the config set along with the index data. Since the search index framework requires a specific naming scheme for config sets and indexes, the restored config set must be changed. It is moved to a new config set with the same name as the target (live) index and the target index is modified to use this config set.

Backup and restore operations are performed by submitting the backup or restore operation to the Solr server via asynchronous requests. Waiting for these operations to complete has its own timeout setting: solr.backupRestoreTimeout

Above is a simple (development) setup with a single Solr server used by live and edit instances. The solr.zookeeperHostList points the ICM application servers to the same Solr server instance. The solr.clusterIndexPrefix distinguishes between the collections of the edit and live instances and the solr.cloudBackupLocation points to an existing directory on the same Solr server.

Note

A single Solr server or cluster may even be shared between multiple ICM instances by setting the solr.clusterIndexPrefix configuration to a name that distinguishes them from each other. It defaults to a simple host name followed by the instance ID of the ICM installation.

The next image shows a setup that uses a single Solr server for the edit system and a Solr cloud cluster for the live system. The backup directory is shared by a network file system between the edit server and all the live servers.

Note

All Solr server in the Solr cluster(s) for source (edit) and target (live) system need a common shared file system where the backups are stored, written by an arbitrary Solr server of the Solr cluster for source (edit) system and read by an arbitrary Solr server of the Solr cluster for target (live) system. This shared file system has nothing to do with the IS7 shared file system, instead it is for the Solr servers only.

The solr.replicationFactor on the live system is set to the number of available live Solr server nodes. (default is =1, maximum is the number of running Solr server nodes). This property is used at restoring the collection to the live system to distribute replicas of the index to ensure load balancing and availability even if one of the Solr servers goes down. The ICM application servers connect to the zookeeper ensemble by the given solr.zooKeeperHostList that lists the zookeepers managing the available Solr servers.

Note

Be careful not to run out of space in the backup directory. The backup directory is currently not cleaned up automatically. A system administrator must manually delete backups from old replication processes.

If the Solr backup directory is also shared to the application servers and is accessible at the same location (solr.cloudBackupLocation), the data replication process will do additional checks to verify the successful execution of the backup (ISSEA-123 - Verify Backup of SolrCloud Indexes Directory Structure during Replication SolrCloud Adapter Version 2.1.0 and higher)

Known limitations:

Search index replication undo is only supported with Solr Cloud Adapter version 2.1.0 and higher (ISSEA-121 - Undo Replication Fails with Exception).
Workaround for earlier versions: Use the old indexes by manually switching back the aliases as Solr admin (http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=<indexalias>&collections=<physicalcollectionname>)
Deleting indexes on the editing system and data replication does not delete the collections on the live system.
Workaround: Delete manually as Solr admin (http://localhost:8983/solr/admin/collections?action=DELETE&name=<physicalcollectionname>) and Solr zookeeper client script (solr/bin/solr zk rm -r /configs/<collection name>)

Solr Search Index Replication (ac_search_solr - Solr 4.8)

This section describes the replication with cartridge ac_search_solr, component set f_search_solr.

All the data and configuration of a Solr search index is entirely contained in the shared file system of the ICM. Multiple Solr instances use the same index data files from the shared file system.

The search index replication uses the generic search index file system staging process to transfer these files from the edit to the live system.

After switching the active directory at the end of the replication, the Solr search indexes need to be refreshed (reloaded). This is done using the RefreshSearchIndexes decorator. This decorator determines all indexes of currently replicated domains and sends a SearchIndexReload event to each application server in the ICM cluster. The event processing then makes a call to the reload method of each index. The one application server that sends out the event will process the event a little differently. It will pass a cluster=true parameter to the reload method. This cluster=true reload creates a new core (<core>-offline) with the new (inactive) data directory and swaps the current online core with this newly created core. After swapping, the old core is deleted. These commands are sent to each of the Solr server nodes that are listed in the solr.SolrClusterNodeURLs property. This loads the new index data into each separate Solr server.

The image above shows the properties and location of data in this variant of Solr index replication. If separate Solr servers are used, the solr.SolrServerURL directs the search requests to an additional load balancer that distributes these requests to the Solr servers. During indexing, the first Solr server in the list of the solr.SolrClusterNodeURLs receives the indexing requests.

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Table of Contents

Introduction

Glossary

References

Replication Process

Generic Search Index Replication

Solr Cloud-specific Replication

Solr Search Index Replication (ac_search_solr - Solr 4.8)