This document is about mass data replication (staging) of Solr search indexes for ICM staging environments. Solr indexes in general reside somewhere in a file system and reflect data that is indexed from the ICM database. There are no search index related artifacts in the database. When replicating database content with ICM mass data replication it is necessary to also transfer the index's file system content from the source system (Edit system) to target system (Live system).
This document covers Solr standalone as well as Solr cloud.
Term | Description |
---|---|
ICM | Intershop Commerce Management |
core | A lucene index on Apache Solr (standalone) side |
collection | A lucene index on Apache Solr side (cloud mode) It may consist of multiple shards and replicas which in turn are cores itself. |
For general information about data replication, see Concept - Mass Data Replication.
To use Solr Cloud replication, please mind configuration keys described in the Configuration of IS7 section of Guide - Deployment Solr Cloud Server.
The search index framework provides the following common search engine neutral artifacts to help with the replication of search indexes.
FND_SearchIndexes
, a staging group that uses the file system staging processor to transfer file system content that is located in the searchindexes
folder of the shared file system.RefreshSearchIndexesDecorator
is the standard staging decorator included in the bc_search cartridge. It processes indexes that are replicated in the file system and calls a reload after switching active directories.SEARCH_INDEXES
, data replication group for the Commerce Management application that maps directly to the search indexes staging group.This chapter describes the replication as with component set f_solrcloud.
When using Solr cloud mode, the index data itself is contained in a Solr cluster, completely independent of the ICM instance. In this respect, it is no different from any relational database system. As a result, we need to transfer the Solr search indexes from one Solr (edit) instance to another Solr (live) instance. This is done using the Solr 6 backup and restore feature. Even if the ICM source (edit) and target (live) systems share a single Solr cluster, indexes must be backed up and restored during replication. This is done using a dedicated staging decorator included in the ac_solr_cloud cartridge.
Deploying the f_solrcloud component set changes the staging.properties to use the BackupAndRestoreSolrCloudIndexesDecorator
as decorator for the search index staging processor.
staging.processor.SearchIndexesStagingProcessor.decorator.0=com.intershop.adapter.search_solr.server.staging.BackupAndRestoreSolrCloudIndexesDecorator
Before the preparation phase (onPrePreparationHook), the decorator determines all the indexes that are contained in the domains that are being processed by the current staging process. For all determined Solr indexes, a backup is triggered via Solr Collection Admin API. The backups are stored in a specific backup location, which is determined by the solr.cloudBackupLocation
configuration key, see Configuration of IS7.
These index backups are stored at the backup location in directories named according to the following naming scheme: <staging-process-id>-<domain>-<indexID>
In the target (live) system, at the refresh cache phase of replication (onPreRefreshCacheHook), the indexes from the backups are restored to separate collections named: <solr.clusterIndexPrefix><domain-name>-<indexID><solr.collectionSuffix1>
or solr.collectionSuffix2
respectively. There is an alias pointing to the current live collection. The backup will be restored to the collection suffix that is not currently referenced by the alias.
Restoring the backed up index will also restore the config set, but it will be named like the config set of the source (edit) index. This is perfectly normal as the backup contains the reference to the config set along with the index data. Since the search index framework requires a specific naming scheme for config sets and indexes, the restored config set must be changed. It is moved to a new config set with the same name as the target (live) index and the target index is modified to use this config set.
Backup and restore operations are performed by submitting the backup or restore operation to the Solr server via asynchronous requests. Waiting for these operations to complete has its own timeout setting: solr.backupRestoreTimeout
Above is a simple (development) setup with a single Solr server used by live and edit instances. The solr.zookeeperHostList
points the ICM application servers to the same Solr server instance. The solr.clusterIndexPrefix
distinguishes between the collections of the edit and live instances and the solr.cloudBackupLocation
points to an existing directory on the same Solr server.
Note
A single Solr server or cluster may even be shared between multiple ICM instances by setting the solr.clusterIndexPrefix
configuration to a name that distinguishes them from each other. It defaults to a simple host name followed by the instance ID of the ICM installation.
The next image shows a setup that uses a single Solr server for the edit system and a Solr cloud cluster for the live system. The backup directory is shared by a network file system between the edit server and all the live servers.
Note
All Solr server in the Solr cluster(s) for source (edit) and target (live) system need a common shared file system where the backups are stored, written by an arbitrary Solr server of the Solr cluster for source (edit) system and read by an arbitrary Solr server of the Solr cluster for target (live) system. This shared file system has nothing to do with the IS7 shared file system, instead it is for the Solr servers only.
The solr.replicationFactor
on the live system is set to the number of available live Solr server nodes. (default is =1, maximum is the number of running Solr server nodes). This property is used at restoring the collection to the live system to distribute replicas of the index to ensure load balancing and availability even if one of the Solr servers goes down. The ICM application servers connect to the zookeeper ensemble by the given solr.zooKeeperHostList
that lists the zookeepers managing the available Solr servers.
Note
Be careful not to run out of space in the backup directory. The backup directory is currently not cleaned up automatically. A system administrator must manually delete backups from old replication processes.
If the Solr backup directory is also shared to the application servers and is accessible at the same location (solr.cloudBackupLocation
), the data replication process will do additional checks to verify the successful execution of the backup (ISSEA-123 - Verify Backup of SolrCloud Indexes Directory Structure during Replication SolrCloud Adapter Version 2.1.0 and higher)
Known limitations:
solr/bin/solr zk rm -r /configs/<collection name>
)This section describes the replication with cartridge ac_search_solr, component set f_search_solr.
All the data and configuration of a Solr search index is entirely contained in the shared file system of the ICM. Multiple Solr instances use the same index data files from the shared file system.
The search index replication uses the generic search index file system staging process to transfer these files from the edit to the live system.
After switching the active directory at the end of the replication, the Solr search indexes need to be refreshed (reloaded). This is done using the RefreshSearchIndexes
decorator. This decorator determines all indexes of currently replicated domains and sends a SearchIndexReload
event to each application server in the ICM cluster. The event processing then makes a call to the reload method of each index. The one application server that sends out the event will process the event a little differently. It will pass a cluster=true
parameter to the reload method. This cluster=true
reload creates a new core (<core>-offline) with the new (inactive) data directory and swaps the current online core with this newly created core. After swapping, the old core is deleted. These commands are sent to each of the Solr server nodes that are listed in the solr.SolrClusterNodeURLs
property. This loads the new index data into each separate Solr server.
The image above shows the properties and location of data in this variant of Solr index replication. If separate Solr servers are used, the solr.SolrServerURL
directs the search requests to an additional load balancer that distributes these requests to the Solr servers. During indexing, the first Solr server in the list of the solr.SolrClusterNodeURLs
receives the indexing requests.