Document Properties
Kbid
23D962
Last Modified
11-Mar-2024
Added to KB
28-Feb-2013
Public Access
Everyone
Status
Online
Doc Type
Concepts
Product
  • ICM 7.10
  • ICM 11
Concept - XML Sitemaps

Introduction

This document describes the concept behind the XML sitemap generation in ICM 7.6+ and ICM 11+ . The XML sitemap feature allows to include links to categories, images, and certain static pages.

Sitemap Protocol

Overview

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a sitemap is an XML file that lists URLs for a site along with additional metadata about each URL. This allows search engines to crawl the site in a more intelligent way.

Web crawlers usually discover pages from links within the site and from links from other sites. Sitemaps supplement this data so that crawlers that support sitemaps can capture all URLs in the sitemap and learn about those URLs from the associated metadata. Using the sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling the site.

Each URL listed in these sitemap files can be crawled and is available together with some additional meta information. This additional information includes:

  • The date of last modification of the file

  • How frequently the page is likely to change

  • The priority of this URL relative to other URLs on the site

When multiple sitemap files are provided, each one of them should be listed in a sitemap index file. More than one sitemap index file may be provided but in the context of Intershop it is assumed that a single index file would meet all practical needs in the future. The format of a sitemap index file is very similar to the format of a sitemap file.

Informing the Search Engines

Once a sitemap file has been created and placed on the web server, the search engines should be informed about it. This can be done by:

  • Submitting the sitemap file to the search engine via the search engine's submission interface

  • Specifying the location in the site's robots.txt file

  • Sending an HTTP request with the sitemap URL

The search engines can then retrieve the sitemap files and make the URLs they contain available to their crawlers.

Protocol Limitations

A website can provide multiple sitemap files, but a sitemap file must not contain more than 50,000 URLs and must not be larger than 10 MB (10,485,760 bytes). The files can be compressed using gzip. However, the unzipped sitemap file must not be larger than 10 MB. If more than 50,000 URLs are listed, multiple sitemap files must be created.

The location of a sitemap file determines the set of URLs that can be included. A sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog but cannot include URLs starting with http://example.com/images/.

Image Sitemaps

Image sitemaps are a Google-specific extension for standard sitemaps. They can be used under a second XML namespace image within the sitemap and are used to power the Google image search.

Examples

Simple Sitemap File with One URL

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://example.com/ebook-reader.html</loc>
      <lastmod>2009-05-06</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.1</priority>
   </url>
</urlset>

Sitemap Index File With Two Sitemap References

<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://example.com/sitemap1.xml</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://example.com/sitemap2.xml</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
</sitemapindex>

Sitemap File with Google-Specific Image Extension

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
 <url>
   <loc>http://example.com/sample.html</loc>
   <image:image>
     <image:loc>http://example.com/image.jpg</image:loc>
   </image:image>
   <image:image>
     <image:loc>http://example.com/photo.jpg</image:loc>
   </image:image>
 </url>
</urlset>

References

XML Sitemaps and Intershop

Overview

XML sitemaps are implemented as a product data feed. Product data feeds are an established concept. Besides XML sitemaps, there are few other product data feeds that are implemented (e.g., Feed Dynamix feeds, Sitemap XML feeds, RSS feeds, etc). Intershop Commerce Management (ICM) allows to include product URLs, categories, product images, and static pages in the XML sitemap feeds.

Syndications

In order to create a product data feed, a "syndication" should be created first. A syndication is a persistent object that has few properties and maintains a relation to a job configuration. Running a syndication (or running a product data feed) results in running the job linked with the syndication. Thus, each product data feed has its own job. The job pipeline that generates the XML sitemap is named ProcessProductSiteMap.

The syndication of persistent object plays another role. Any configuration for its job is saved as custom attribute of the syndication. The job can access the syndication and read out the configuration properties. ICM includes a small framework and a back office wizard for managing syndications.

Each syndication is managed with a uniform, tabbed UI. Any syndication-specific settings (e.g., those for the XML sitemap) are shown in the tab named Target. Each product data feed provides a custom view pipeline for rendering this tab. The pipeline that renders the XML sitemap Target tab is named ViewChannelOutboundSyndicationSitemapConfiguration.

Commerce Management Application UI

Intershop Commerce Management application (ICM) allows the user to create and configure XML sitemap data feeds and other data feeds.

Target Type

One option of the drop-down menu must be chosen to select the type of data feed to be generated. For ICM 11+ only Sitemap XML for PWA or Sitemap XML for PWA3 are to be used. The others only work for the Responsive Starter Store in ICM 7.10.

Channel-ProductDataFeeds-with-SiteMapPWA3-Details-Target-possible-Types_small.jpg

Configure Target

This form holds the configuration of the product data feed selected before. In this example the form for the Sitemap XML (HTTPS) is shown.

The Target tab provides the following means to configure:

  • File name for the generated sitemap

  • The locale for the URLs

  • The currency for the URLs

  • The gzip compression level (or 0 if uncompressed)

  • An option to ping search engines after the sitemap generation is over

  • Whether products should be included

  • Which product images should be included together with the products if selected

  • Whether categories should be included

  • Which static pages should be included

  • The change frequency, priority, and last modified date settings for the sitemap (see the sitemap protocol for more details)

Configuration for gzip Compression Level

In ICM 7.10.38 the option None (XML) was introduced in the compression level drop-down.

  • A sitemap with its content being in XML format is assumed to have a better SEO ranking than a smaller zip-archive (.xml.gz) containing said XML files.

  • The compression settings from 0 to 9 work as before and produce files like: SitemapXML-product-0.xml.gz

  • The compression setting: None (XML) allows to produce content files like: SitemapXML-staticpage-0.xml

Scheduling Configuration

To automatically run a product data feed, the Scheduling tab provides a number of settings. The default scheduling is set to Manually whereby the feeds can be triggered in the product data feed listing page. Run Once and Recurring Interval allow to run feeds automatically at a given time.

Configuration Files

Low-level configurations for the sitemap generation feature are available in the syndication-targets.properties (cluster configuration directory). All relevant settings are prefixed with intershop.syndication.target.Sitemaps. They are described below by using their suffixes.

Suffix

Description

Example

processPipeline

The job pipeline that creates the sitemap.

ProcessProductSiteMap

configPipeline

The view pipeline that renders the Target tab.

ViewChannelOutboundSyndicationSitemapConfiguration

displayName

The name under which the sitemaps are shown in the back office.

Sitemap XML

displayName.de_DE

Localized names for the given sitemap.

Sitemap XML

marshaller

The class that marshals the sitemap index file and manages the creation of the sitemap files.

com.intershop.component.marketing.capi.syndication.SitemapXMLMarshaller

protocol

Defines which protocol is used for the rendered links

either HTTP or HTTPS

sitemapPipeline

The pipeline used to download sitemaps.

ViewSiteMapXML-Start

objecttypes

Comma-separated names of all object types that will be exported. For any of these there are certain properties prefixed with objecttype.

Product,CatalogCategory,StaticPage

objecttype.<objectName>.class

The class that represents the object to be exported.

com.intershop.beehive.xcs.capi.product.Product

objecttype.<objectName>.xmlCompositionAdapter

A class that is used to compose the URL for the object to be exported. Subclass of SitemapObjectXMLCompositionAdapter.

com.intershop.component.mvc.internal.sitemap.SitemapProductXMLCompositionAdapter

objecttype.<objectName>.viewingPipeline

The view pipeline used to show the exported object.

ViewProduct-Start

objecttype.<objectName>.filePattern

The pattern for the sitemap file name. This file name will be included in the sitemap index file.

product

searchEngines

A comma-separated list with all search engines that should be pinged after the sitemap creation.PageletEntryPointIDs

Google,Bing,Yahoo,Ask

searchEngine.<searchenginename>.URL

The URL of the search engine to be pinged. The URL of the newly generated sitemap is included in curly brackets.

[http://www.google.com/webmasters/sitemaps/ping?sitemap={0}

searchEngine.<searchenginename>.RetryCount

The number of ping retries if the first ping was unsuccessful.

3

exportDirectory

The directory of the created sitemap file.

${SYNDICATION_DIR}/sitemaps/${SYNDICATION_ID}

XML Sitemaps and Intershop PWA

Starting with ICM 7.10.26 it is possible to generate a sitemap containing Intershop Progressive Web App (PWA) compatible links.

The implementation is mainly done with PWA-specific URL rewrite rules and some changes in the sitemap code. These rewrite rules create PWA URLs that match the standard PWA routes for product, category, and content pages. Project-specific adaptions of the PWA routes must be reflected in the rewrite rules as well. For more information regarding the PWA related rewrite rules, please refer to Concept - URL Rewriting.

Configuration - Syndication Target

A syndication target configuration for PWA called Sitemaps-PWA has been created in the syndication-targets.properties. All keys in this file start with intershop.syndication.target.Sitemaps-PWA.

Select either Sitemaps XML for PWA or Sitemaps XML for PWA3 as Type below the Target tab when creating a new product data feed for the PWA(3) in the channel back office:

Channel-ProductDataFeeds-with-SiteMapPWA3-Details-Target-possible-Types_small.jpg

Configuration - Application Configuration

The Application drop-down allows to select an application. The list includes applications of the application type intershop.REST, which is currently used for the PWA.
If no application is selected, a fallback to the default application for the PWA with the URL identifier rest is used. As this fallback might not work for every setup, we recommend selecting the one matching your needs.

Configuration - URL Host Name

The intershop.WebServerSecureURL property is used as protocol, host name, and port configuration for the URLs created by the sitemap process.

For details on how to modify the WebServerSecureURL property value, refer to Cookbook - XML Sitemaps | Recipe: Configure Domain-Specific Host Names for XML Sitemap URLs.

Configuration - Configuration File Changes

This is an overview of the configurations specifically used for the PWA sitemap:

  • The syndication target is Sitemaps-PWA, see Configuration - Syndication Target.

  • Low-level configurations for the sitemap generating feature are available in the syndication-targets.properties (cluster configuration directory).

  • All relevant settings are prefixed with intershop.syndication.target.Sitemaps-PWA.

Suffix

Description

Example

isSitemapPWA

If this optional parameter is 'true', the configuration is treated as a PWA sitemap. It triggers necessary UI options, like the Application select box. Default value is 'false'.

true

allowedAppID

This parameter filters/limits the list of applications selectable in the UI. (See Configuration - Application Configuration). If not defined, all applications of the channel are listed.

intershop.REST

Note

The sitemapPipeline and viewingPipeline configurations are: ViewSiteMapXMLforPWA, ViewProductPWA, ViewStandardCatalogPWA, and ViewContentPWA. None of the pipelines actually exist.

This pipeline name configuration is only used as unique identifier for the URL rewrite rules. Since these links are specifically created for the PWA, they do not work in the Responsive Starter Demo Store (e.g., inSPIRED-inTRONICS-Site).

Configuration - Domain Splitting

A domain splitting configuration in the domainsplitting.xml file is required to shorten the first part of the URLs used for the sitemap files. Otherwise, the URLs are not correctly compacted and will not work for the PWA.

Example:

  • wrong: https://icm-rss-host-name.net/WFS/inTRONICS/en_US/rest/USD/

  • correct: https://icm-rss-host-name.net/


For 7.10 the schema of the domain splitting XML can be found in bc_urlrewrite/staticfiles/definition/domainsplittings.xsd. The actual domain splittings are defined in the file domainsplittings.xml, which is deployed to your application server under /share/system/config/cluster/domainsplittings.xml. The file can be overwritten in projects.

For ICM 11+ the schema of the domain splitting XML can be found in bc_urlrewrite\src\main\resources\resources\bc_urlrewrite\domainsplittings.xsd. The actual domain splittings can be defined in multiple files with name domainsplittings.xml located in the config folder of any cartridge, e.g., your_cartridge\src\main\resources\resources\your_cartridge\config. Since all entries are collected from multiple files, it might be necessary to ensure a certain order because some entries should overrule others. For that purpose, every domain splitting entry can have an optional priority as integer value. If no priority is explicitly defined, the default priority of “20” is used. The standard domain splitting entry from icm-as has a priority of “10” and will, therefore, be overruled by another entry with the same parameters without a defined priority. Before applying the domain splittings, the entries are ordered by priority starting with the highest. More details are documented in Guide - 11.8.0 - Changes of Domain Splitting and URL Rewrite Rules Configuration (PWA Sitemap Generation).

Configuration Explained

  1. host (e.g.: icm-rss-web-store.com) - this must exactly match the host name configured in the appserver.properties file for parameter “intershop.WebServerSecureURL” (case sensitive)

    • A configuration for the PWA host name in this section is not necessary (see last step).

  2. site (e.g.: inSPIRED-inTRONICS-Site) - is an optional parameter

  3. shortpathpattern (${path} or /de${path}) - defines what the short version of the URL looks like

    • ${path} removes “/WFS/inTRONICS/en_US/rest/USD/”, writes protocol, host name (and port), and puts the path data at the end.

    • /de${path} removes “/WFS/inTRONICS/en_US/rest/USD/”, places “/de” after the URL (protocol, host name (and port)), and puts the path data at the end.

    • Example: https://icm-rss-host-name.net/WFS/inTRONICS/en_US/rest/USD/ shortened to https://icm-rss-host-name.net/ + path data

    • Path data example is ViewProductPWA-Start?productSKU=123456789 - this gets converted into a valid PWA path by the rewrite rule SitemapProductPWA.

  4. server-group (e.g.: WFS) - is an optional parameter

    • It usually is “WFS” for Web Front Service

  5. currency (e.g.: USD) - is an optional parameter to make the configuration currency specific

  6. appurlid (e.g.: rest) - is an optional parameter to make the configuration work only for application with the defined app URL ID

  7. locale (e.g.: en_US) - is an optional parameter to make the configuration locale specific

  8. The last step in generating the Sitemap URLs for PWA(3), is to replace the protocol, host name, and port configured in External Base URL of the selected application, see Configuration - URL Host Name.

    • Example: https://icm-rss-host-name.net:443 gets converted to http://intershop-pwa-host-name.com - will be identical to the External Base URL.

    • Currently anything after the port configured in the External Base URL will be ignored!

      • Example: http://intershop-pwa-host-name.com:444/somePath/anotherPath - the /somePath/anotherPath part will be ignored

      • If needed, you can use the shortPathPattern: /somePath/anotherPath${path} to get the required URL string between port and the ${path}

Example Configuration

domainsplitting.xml - Example of PWA Host Name

<domainsplitting name="main host for rest-app - PWA - for en_US and USD">
    <hosts>
	    <host>icm-rss-web-store.com</host>
    	<host>host.docker.internal</host>
    </hosts>
    <site>inSPIRED-inTRONICS-Site</site>
    <shortpathpattern>${path}</shortpathpattern>
    <server-group>WFS</server-group>
    <currency>USD</currency>
    <appurlid>rest</appurlid>
    <locale>en_US</locale>
</domainsplitting>

domainsplitting.xml - Example of PWA Host Name and de_DE Language

<domainsplitting name="main host for de_DE and EUR - for PWA">
  <hosts>
    <host>icm-rss-web-store.com</host>
	<host>host.docker.internal</host>
  </hosts>
  <site>inSPIRED-inTRONICS-Site</site>
  <shortpathpattern>/de${path}</shortpathpattern>
  <server-group>WFS</server-group>
  <currency>EUR</currency>
  <appurlid>rest</appurlid>
  <locale>de_DE</locale>
</domainsplitting>

domainsplitting.xml - Example of Minimal Configuration for PWA

In case there is just one channel with just one application, this minimal configuration is practicable.

<domainsplitting name="PWA Sitemap - defines a shortpathpattern for all hosts">
    <hosts>
	    <host>icm-rss-web-store.com</host>
    	<host>host.docker.internal</host>
    </hosts>
    <shortpathpattern>${path}</shortpathpattern>
</domainsplitting>

docker-compose.yml NGINX Configuration

Configure the multi-channel:

nginx:
  ... 
    environment: 
      ...
      MULTI_CHANNEL: |
        .+:
          - baseHref: /us
            channel: inSPIRED-inTRONICS-Site 
            lang: en_US 

Configuration - Content Pages to be Included in the Sitemap

Only publicly available content pages can be selected to be included in a sitemap. All publicly available pages must be listed in a pageletACL.properties file, located in a cartridge that is added to the application-specific cartridge list (separate for responsive and headless). Ideally, each cartridge that contains .pagelet2 files should provide these files separately.

For ICM 7.10:
app_sf_base_cm/staticfiles/cartridge/config/pageletACL.properties

For ICM 11+:
app_sf_base_cm/src/main/resources/resources/app_sf_base_cm/config/pageletACL.properties

The file contains all public pages such as:

PageletACL.app_sf_base_cm\:page.helpdesk.pagelet2-Page.public=true
PageletACL.app_sf_base_cm\:page.privacyPolicy.pagelet2-Page.public=true
PageletACL.app_sf_base_cm\:page.termsAndConditions.pagelet2-Page.public=true

Please keep in mind that the cartridge name (PageletACL.XXX) must match your cartridge shown in the overlay (see screenshot).

image-20240311-100007.png

Fix for Accessing Generated Sitemap Files

This fix is available from ICM versions: 7.10.26-LTS, 7.10.32-LTS, 7.10.37

The PWA cannot access generated sitemap files.

Example sitemap location: share/sites/inSPIRED-inTRONICS-Site/units/inSPIRED-inTRONICS/syndication/sitemaps/sitemap_pwa/

To solve this problem, the generated sitemap files must be copied to a more convenient location, where the PWA has access to:

  • Location (generic): share/sites/{channel}/1/static/{language}/sitemaps/pwa/

  • Example sitemap file location: share/sites/inSPIRED-inTRONICS-Site/1/static/en_US/sitemaps/pwa/sitemap_pwa.xml

Rewrite Rules Configuration for Progressive Web App URLs

This setup is required to generate the second part of the URL.

Convert /ViewProduct-Start?SKU=1234567&CatalogID=Cameras&CategoryName=575 to something like /Digital-Cameras/Pentax-Optio-RZ10-sku5920586-catCameras-Camcorders.575

Rewrite Rules for PWA

This concept is valid from Intershop 7.10.26.

The following rewrite rules create sitemap URLs for up to PWA 2.x:

  • SitemapRangePWA

  • SitemapProductPWA

  • SitemapCategoryPWA

  • SitemapContentPagePWA

These created URLs only work for the Intershop Progressive Web App (PWA), not for the inSPIRED demo store. Thus, only the compact code is implemented, not the code for expanding URLs.

In case the URLs from a customized PWA differ from the current PWA implementation, new rewrite rules might be required. See Cookbook - URL Rewriting | Recipe: Create a New Rewrite Rule for further information on how to write a customized rewrite rule.

There are Optional PWA Rule Configurations which can compensate some PWA route modifications. These rules are closely linked to Concept - XML Sitemaps | XML Sitemaps and Intershop PWA and its configurations in syndication-targets.properties (cluster configuration directory).

The pipeline names in syndication-targets.properties configured for sitemapPipeline and viewingPipeline: ViewSiteMapXMLforPWA-Start, ViewProductPWA-Start, ViewStandardCatalogPWA-Browse, and ViewContentPWA-Start do not exist. They are only used as a unique identifier for the rewrite rules described in this chapter.

For all PWA rewrite rules up to Intershop 7.10.40.7, the protocol is fixed on https as configured in syndication-targets.properties: intershop.syndication.target.Sitemaps-PWA.protocol=https

From Intershop 7.10.40.7, the protocol handling has changed : If the REST applications - External Base URL has no protocol defined, the default protocol https is used for PWA URLs. The protocol configuration defined in property key intershop.syndication.target.Sitemaps-PWA.protocol is no longer used.

SitemapRangePWA

Description

This rule generates the links for the initial sitemap_pwa.xml file.

This file contains URLs to the location of zipped xml file(s) located in the ICM Shared File System (SFS), which contain the actual sitemap content for products, categories, and pages.

The file name and its extension meet the following requirements:

  • syndication-id=sitemap_pwa

  • objectType=product, catalogcategory, staticpage

  • extension=.xml.gz

This configuration results in <loc> URLs shown in the following example (e.g., sitemap_pwa-product-0.xml.gz).

Example URLs

sitemap_pwa

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<sitemap>
		<loc>https://intershoppwa.azurewebsites.net/sitemap_pwa-product-0.xml.gz</loc>
		<lastmod>2020-10-16T18:20:15+02:00</lastmod>
	</sitemap>
	<sitemap>
		<loc>https://intershoppwa.azurewebsites.net/sitemap_pwa-catalogcategory-0.xml.gz</loc>
		<lastmod>2020-10-16T18:20:41+02:00</lastmod>
	</sitemap>
	<sitemap>
		<loc>https://intershoppwa.azurewebsites.net/sitemap_pwa-staticpage-0.xml.gz</loc>
		<lastmod>2020-10-16T18:20:41+02:00</lastmod>
	</sitemap>
</sitemapindex>

Default Rewrite Rule Configuration

This urlrewriterules.xml configuration gets the default URLs for the XML sitemap for PWA.

urlrewriterules.xml - default SitemapRangePWA rule section

<rule type="SitemapRangePWA" priority="100" name="sitemap range pwa links"></rule>

Optional Rewrite Rule Configurations

These optional configurations replace the default configurations:

urlrewriterules.xml - custom SitemapRangePWA rule section

<rule type="SitemapRangePWA" priority="100" name="sitemap range pwa links">
  <configurations>
    <!-- below are optional parameters to customize the default behavior of the SitemapRangePWA rewrite rule -->
    <configuration id="shortPath">/syndication-</configuration>
    <configuration id="syndicationID">sitemap_for_pwa</configuration>
    <configuration id="pwaHost">www.intershop.com</configuration> <!-- optional host name configuration; if not set, the 'External Base URL' of the REST application is used (the usual setup) ->
  </configurations>
</rule>

This configuration would result in URLs like: <loc>https://www.intershop.com/syndication-sitemap_for_pwa-staticpage-0.xml.gz</loc>

Removal of optional rewrite rule configuration parameter sitemapFileExtension - since ICM 7.10.38

From ICM 7.10.38 the sitemaps generation UI can be configured to create xml files instead of zip-archives containing xml files.

The optional configuration parameter sitemapFileExtension and the value noFileExtension have been removed, because they were never used.
Since the file extension now depends on the sitemap compression configuration in the back office, it is not useful to change the extension in the rewrite rule configuration.

SitemapProductPWA

Description

This rule is based on the Category rewrite rule and inherits some of its configuration options (see Category rule details), because the biggest part of the sitemap product URLs for the PWA is the category where the product is assigned to.

It is used when the pipeline name is: ViewProductPWA-Start, which is configured in the syndication-targets.properties file section Sitemaps-PWA.

Default Rewrite Rule Configuration

This urlrewriterules.xml configuration gets the default product URLs for the XML sitemap for PWA.

Rule configuration parameters:

Name

Value

Description

slugifyPwaDefault

true

The slugify method usually handles any characters in a string that are problematic for URLs. This may apply to any localized texts from categories and products used for the URL.
The default behavior for the slugify method is to change all characters to lower case, convert, e.g., German umlauts, and remove apostrophes on characters used in French and Czech language, for example.

The PWA has its own URL handling and, therefore, there is no need to adapt URLs. So the default for the PWA is to keep these characters unchanged in the resulting URLs.
The slugify method's behavior can be changed with a configuration in case it is needed.

excludedCharacters

( )

Removes character '(' and ')' from the URLs so that they do not cause any problems.
The excludedCharacters configuration takes a space-separated list of characters to be removed.

( ) &amp;

Since 7.10.40 - A fix to remove the ampersand '&' character from URLs path, because they do not belong to the path part. They belong to the parameters part.

Example configuration for SitemapProductPWA:

urlrewriterules.xml - default SitemapProductPWA rule section

<rule type="SitemapProductPWA" priority="100" name="sitemap product pwa links">
	<configurations>
		<configuration id="slugifyPwaDefault">true</configuration>
		<configuration id="excludedCharacters">( ) &amp;</configuration>
	<!--	<configuration id="excludedCharacters">( )</configuration> before 7.10.40 release -->
	</configurations>
</rule>

Optional Rewrite Rule Configurations

See Optional PWA Rule Configurations.

Example URLs

sitemap_product

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
	<loc>https://intershoppwa.azurewebsites.net/Digital-Cameras/Pentax-Optio-RZ10-sku5920586-catCameras-Camcorders.575</loc>
	<lastmod>2020-09-30T19:57:14+02:00</lastmod>
	<changefreq>weekly</changefreq>
	<priority>0.8</priority>
	<image:image>
		<image:loc>https://intershoppwa.azurewebsites.net:443/INTERSHOP/static/WFS/inSPIRED-inTRONICS-Site/rest/inSPIRED/en_US/L/5920586-7387.jpg</image:loc>
		<image:title>Pentax Optio RZ10</image:title>
		<image:caption>Pentax Optio RZ10</image:caption>
	</image:image>
	<image:image>
		<image:loc>https://intershoppwa.azurewebsites.net:443/INTERSHOP/static/WFS/inSPIRED-inTRONICS-Site/rest/inSPIRED/en_US/S/5920586-7387.jpg</image:loc>
		<image:title>Pentax Optio RZ10</image:title>
		<image:caption>Pentax Optio RZ10</image:caption>
	</image:image>
</url>
...
</urlset>

SitemapCategoryPWA

Description

This rule is based on the Category rewrite rule and inherits some of its configuration options (see Category rule details).
It is used when the pipeline name is: ViewStandardCatalogPWA-Browse, which is configured in the syndication-targets.properties file section Sitemaps-PWA.

Default Rewrite Rule Configuration

This urlrewriterules.xml configuration gets the default category URLs for the XML sitemap for PWA.

See the explanations of the rule configuration parameters slugifyPwaDefault and excludedCharacters in SitemapProductPWA | Default Rewrite Rule Configuration.

Example configuration for SitemapCategoryPWA:

urlrewriterules.xml - default SitemapCategoryPWA rule section

<rule type="SitemapCategoryPWA" priority="100" name="sitemap category pwa links">
	<configurations>
		<configuration id="slugifyPwaDefault">true</configuration>
		<configuration id="excludedCharacters">( ) &amp;</configuration>
	<!--	<configuration id="excludedCharacters">( )</configuration> before 7.10.40 release -->
	</configurations>
</rule>

Optional Rewrite Rule Configurations

See Optional PWA Rule Configurations.

Example URLs

sitemap_category

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
	<loc>https://intershoppwa.azurewebsites.net/Backpacks,-Notebook-Bags-&amp;-Cases-catComputers.1835.3003</loc>
	<lastmod>2020-09-30T19:55:03+02:00</lastmod>
	<changefreq>monthly</changefreq>
	<priority>0.4</priority>
</url>
<url>
	<loc>https://intershoppwa.azurewebsites.net/Remote-Controls-catHome-Entertainment.1058.857</loc>
	<lastmod>2020-09-30T19:55:09+02:00</lastmod>
	<changefreq>monthly</changefreq>
	<priority>0.4</priority>
</url>
<url>
	<loc>https://intershoppwa.azurewebsites.net/Firewire-Cables-catComputers.106.830.1306</loc>
	<lastmod>2020-09-30T19:55:03+02:00</lastmod>
	<changefreq>monthly</changefreq>
	<priority>0.4</priority>
</url>
...
</urlset>

SitemapContentPagePWA

In case the sitemap should also include static pages, please be aware that currently only public available content pages can be selected in the sitemap content selection. See section Configuration - Content Pages to be Included in Sitemap for further details.

Description

This rule is based on the Page rewrite rule and inherits some of its configuration options, see Page rule details.
It is used when the pipeline name is: ViewContentPWA-Start, which is configured in the syndication-targets.properties file section Sitemaps-PWA.

Default Rewrite Rule Configuration

This urlrewriterules.xml configuration gets the static pages' URLs for the XML sitemap for PWA.

Next to the default behavior, there are ways to configure the SitemapContentPagePWA rule:

  • The default behavior simply places the pageletId found behind a default shortPath of '/page', e.g.: /page/page.helpdesk.faq

  • For an individual configuration use the pageletId found to set a configured unique shortPath. The host name can be set as an optional configuration.

    • The mode changes depending on whether the pageletId configuration for a SitemapContentPagePWA rule is set or not.

    • The specific rule (privacy-policy in the example below) has to have a higher priority (105) than the common rule (100).

  • The optional parameter pwaHost can be used for local setups or debugging purposes.

    • It forces the host name to be the rule configured host, instead of the one in the Application configuration for 'rest' applications.

urlrewriterules.xml - SitemapContentPagePWA rule section

<!-- individual configuration -->
<rule type="SitemapContentPagePWA" priority="105" name="sitemap content page pwa privacy-policy">
  <configurations>
	<configuration id="pageletId">systempage.privacyPolicy.pagelet2-Page</configuration>
	<configuration id="shortPath">/en/privacy-policy</configuration>
	<!-- <configuration id="pwaHost">www.intershop.com</configuration> -->
  </configurations>
</rule>
<!-- default -->
<rule type="SitemapContentPagePWA" priority="100" name="sitemap content page pwa links"></rule>

Example URLs

sitemap_static-page

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
	<loc>https://intershoppwa.azurewebsites.net/page/page.helpdesk.faq</loc>
	<lastmod>2020-09-30T20:00:09+02:00</lastmod>
	<changefreq>yearly</changefreq>
	<priority>0.3</priority>
</url>
<url>
	<loc>https://www.intershop.com/en/privacy-policy</loc>
	<lastmod>2020-10-05T12:34:00+02:00</lastmod>
	<changefreq>yearly</changefreq>
	<priority>0.3</priority>
</url>
<url>
	<loc>https://intershoppwa.azurewebsites.net/page/systempage.termsAndConditions.pagelet2-Page</loc>
	<lastmod>2020-10-05T12:34:00+02:00</lastmod>
	<changefreq>yearly</changefreq>
	<priority>0.3</priority>
</url>
...
</urlset>

Optional PWA Rule Configurations

Description

There are some configurations which work for all PWA rewrite rules. These optional configurations are described here.

  • The default pipeline name which has to match for the rule to be applied can be modified with configuration of startNode.

  • For all PWA rewrite rules the host name is either extracted from the External Base URL configured in the assigned application, from the rewrite rule configuration pwaHost, or from intershop.WebServerSecureURL.

    • With the configuration pwaHost each rule can set its own host name for its generated sitemap URLs.

    • Except for image URLs, the URLs always depend on intershop.WebServerSecureURL.

  • Individual control of the behavior of the slugify method. This method prevents URL issues, see SitemapProductPWA | Default Rewrite Rule Configuration.

    • For the PWA the default behavior is set with: slugifyPwaDefault=true which results in URLs as described in section SitemapProductPWA | Default Rewrite Rule Configuration.

    • In case some character usually handled by the slugify method causes trouble as part of the URL, a more selective configuration is possible:

      • slugifyPreventToLowerCase=true prevents the slugify method from converting all characters to lower case.

      • slugifyPreventReplaceUmlauts=true prevents the slugify method from converting German umlauts from ä to ae and ö to oe and so on.

      • slugifyPreventStripAccents=true prevents the slugify method from removing apostrophes from, e.g., French or Czech characters.

      • slugifyPwaDefault=true is all of the above set to true, so the URLs contain upper case characters, German umlauts, and characters with apostrophe.

  • The configuration charactersToEncode was introduced to encode special characters in case they cause problems.

    • It allows to configure characters which need to be encoded (java.net.URLEncoder.encode(...)) - for example, converts '-6GB-(2GB-x-3)-sku' to -6GB-%282GB-x-3%29-sku

    • The charactersToEncode configuration takes a list of characters to be encoded. No separation character! See example below.

  • The configuration categoryPathPrefix allows to replace the default -cat with a customized version. Only valid for SitemapProductPWA and SitemapCategoryPWA rule.

    • from: https://intershoppwa.azurewebsites.net/Remote-Controls-catHome-Entertainment.1058.857

    • to: https://intershoppwa.azurewebsites.net/Remote-Controls-category-Home-Entertainment.1058.857

Example - urlrewriterules.xml

urlrewrite.xml - configuration

<rule type="SitemapProductPWA" priority="105" name="sitemap product pwa links customized">
  ...
  <configurations>
	<configuration id="startNode">ViewProductPWACustom-Start</configuration>
	<configuration id="pwaHost">www.customized.intershop.pwa.azurewebsites.net:449</configuration>

    <!-- both 'slugifyPwaDefault' and 'slugifyPrevent...'do not make sense at the same time -->
    <!-- <configuration id="slugifyPwaDefault">true</configuration> -->
    <configuration id="slugifyPreventToLowerCase">true</configuration>
	<configuration id="slugifyPreventReplaceUmlauts">true</configuration>
	<configuration id="slugifyPreventStripAccents">true</configuration>

	<configuration id="charactersToEncode">( ) &amp</configuration>
	<configuration id="categoryPathPrefix">-category-</configuration>
    ...
  </configurations>
</rule>

Rewrite Rules for PWA3

This concept is valid from Intershop 7.10.40.

From Intershop 7.10.40, we introduced further rewrite rules which create sitemap URLs. These URLs only work for Intershop Progressive Web App (PWA) version 3, not for the PWA before version 3, nor for the inSPIRED demo store.

Since the release of PWA3, its URLs have changed, which makes the previous PWA sitemap links incompatible. The rules described in this chapter are designed to create PWA3-compatible URLs for the sitemap.

If the URLs from a customized PWA3 differ from the current PWA3 implementation, new rewrite rules might be required. See Cookbook - URL Rewriting | Recipe: Create a New Rewrite Rule for information on how to write a customized rewrite rule.

There are a few optional configurations which can compensate some PWA route modifications.

These rules are closely linked to Concept - XML Sitemaps | XML Sitemaps and Intershop PWA and its configurations in syndication-targets.properties (cluster configuration directory).

The pipeline names in syndication-targets.properties configured for sitemapPipeline and viewingPipeline: ViewSiteMapXMLforPWA-Start, ViewProductPWA3-Start, ViewStandardCatalogPWA3-Browse and ViewContentPWA3-Start do not exist. They are only used as a unique identifier for the rewrite rules described in this chapter.

For all PWA3 rewrite rules up to Intershop 7.10.40.7, the protocol is fixed on https as configured in syndication-targets.properties: intershop.syndication.target.Sitemaps-PWA3.protocol=https

From Intershop 7.10.40.7, the protocol handling has changed: If the REST applications - External Base URL has no protocol defined, the default protocol https is used for PWA URLs. The protocol configuration defined in property key intershop.syndication.target.Sitemaps-PWA3.protocol is no longer used.

SitemapRangePWA3

Description

This rule generates the links for the initial sitemap_pwa.xml file.

For this rule, no new PWA3 version has been developed, because nothing has changed here from the previous PWA version.

For details, see SiteRangePWA.

SitemapProductPWA3

Description

This rule is based on the Category rewrite rule and uses some of its configuration options (see Category rule details), because the first and the last part of the sitemap product URLs for PWA3 is the category where the product is assigned to.

It is used when the pipeline name is: ViewProductPWA3-Start, which is configured in the syndication-targets.properties file section Sitemaps-PWA3.

Default Rewrite Rule Configuration

This urlrewriterules.xml configuration gets the default product URLs for the XML sitemap for PWA3.

Rule configuration parameters:

Name

Value

Description

slugifyPwa3Default

true                             

The slugify method usually handles any characters in a string that are problematic for URLs.

This may apply to any localized texts from categories and products used as part of the URL.
The default behavior for the slugify method is to change all characters to lower case, convert, e.g., German umlauts, and remove apostrophes on characters used in French and Czech languages, for example.

PWA3 has its own URL handling and, therefore, there are minor changes to the URLs. The default for PWA3 is to keep these characters unchanged in the resulting URLs. Only conversion to lower case is applied.
The slugify method's behavior can be changed with that configuration in case it is needed.

excludedCharactersRule

[ &\(\)=]

Removes characters: <space>, '&', '(', ')' and '=' from the URLs so that they do not cause any problems.
The excludedCharactersRule is a regular expression configuration which defines a list of characters to be removed.

It can contain a list of black-listed characters or a list of white-listed characters.

[^a-zA-Z0-9äöüÄÖÜé]

The white-listed characters like: [^a-zA-Z0-9äöüÄÖÜé] are currently not used for this rule.
Since neither (black or white list) can be assumed to cover all possibilities, this switch might be used.

It can be modified to your needs.

urlCharactersNotToEncode

/,'

For better SEO ranking, PWA(3) URLs contain localized texts. This text contains characters that are sometimes problematic in a URL and will be encoded. Example: / becomes %2F when encoded.

To match the URLs used in PWA3 by the sitemap generator code, a few of those characters must remain.

This configuration ensures that '/' , ',' and ' ' ' will not be URL encoded.

Example configuration for SitemapProductPWA3:

urlrewriterules.xml - default SitemapProductPWA rule section

<rule type="SitemapProductPWA3" priority="100" name="sitemap product pwa links for PWA3">
	<configurations>
		<configuration id="slugifyPwa3Default">true</configuration>
		<configuration id="excludedCharactersRule">[ &\(\)=]</configuration>
    <!-- <configuration id="excludedCharactersRule">[^a-zA-Z0-9äöüÄÖÜé]</configuration> white-listed characters example -->
		<configuration id="urlCharactersNotToEncode">/,'</configuration>
	</configurations>
</rule>

Optional Rewrite Rule Configurations

See Optional PWA Rule Configurations.

Example URLs

sitemap_product

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
	<loc>https://intershoppwa.azurewebsites.net/computer/datenspeicher/festk%C3%B6rperdrives/a-data-s511-240gb-prd9013198-ctgComputers.206.1563</loc>
	<lastmod>2020-09-30T19:57:14+02:00</lastmod>
	<changefreq>weekly</changefreq>
	<priority>0.8</priority>
	<image:image>
		<image:loc>https://intershoppwa.azurewebsites.net:443/INTERSHOP/static/WFS/inSPIRED-inTRONICS-Site/rest/inSPIRED/en_US/L/9013198-7387.jpg</image:loc>
		<image:title>A Data S511</image:title>
		<image:caption>A Data S511</image:caption>
	</image:image>
	<image:image>
		<image:loc>https://intershoppwa.azurewebsites.net:443/INTERSHOP/static/WFS/inSPIRED-inTRONICS-Site/rest/inSPIRED/en_US/S/9013198-7387.jpg</image:loc>
		<image:title>A Data S511</image:title>
		<image:caption>A Data S511</image:caption>
	</image:image>
</url>
...
</urlset>

SitemapCategoryPWA3

Description

This rule is based on the Category rewrite rule and inherits some of its configuration options. See Category rule details.

It is used when the pipeline name is: ViewStandardCatalogPWA3-Browse, which is configured in the syndication-targets.properties file section Sitemaps-PWA3.

Default Rewrite Rule Configuration

This urlrewriterules.xml configuration gets the default category URLs for the XML sitemap for PWA.

See the explanations of the rule configuration parameters slugifyPwa3Default, excludedCharactersRule, and urlCharactersNotToEncode in SitemapProductPWA3 | Default Rewrite Rule Configuration.

Example configuration for SitemapCategoryPWA3:

urlrewriterules.xml - default SitemapCategoryPWA rule section

<rule type="SitemapCategoryPWA3" priority="100" name="sitemap category pwa links for pwa3">
	<configurations>
		<configuration id="slugifyPwa3Default">true</configuration>
		<configuration id="excludedCharactersRule">[ &\(\)=]</configuration>
		<configuration id="urlCharactersNotToEncode">/,'</configuration>
	</configurations>
</rule>

Optional Rewrite Rule Configurations

See Optional PWA Rule Configurations.

Example URLs

sitemap_category

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
	<loc>https://intershoppwa.azurewebsites.net/computer/hardware-komponenten/geh%C3%A4use-komponenten/pc-k%C3%BChlventilatoren-ctgComputers.106.236.921</loc>
	<lastmod>2020-09-30T19:55:03+02:00</lastmod>
	<changefreq>monthly</changefreq>
	<priority>0.4</priority>
</url>
<url>
	<loc>https://intershoppwa.azurewebsites.net/computer/notebooks-und-pcs/backpacks,-notebook-bags-cases-ctgComputers.1835.3003</loc>
	<lastmod>2020-09-30T19:55:09+02:00</lastmod>
	<changefreq>monthly</changefreq>
	<priority>0.4</priority>
</url>
<url>
	<loc>https://intershoppwa.azurewebsites.net/konferenzausstattung/beamer-ctgpresentation-conferencing.data-projectors</loc>
	<lastmod>2020-09-30T19:55:03+02:00</lastmod>
	<changefreq>monthly</changefreq>
	<priority>0.4</priority>
</url>
...
</urlset>

SitemapContentPagePWA3

Description

This rule is based on the Page rewrite rule and inherits some of its configuration options, see Page rule details.

It is used when the pipeline name is: ViewContentPWA3-Start, which is configured in the syndication-targets.properties file section Sitemaps-PWA3.

It is a copy of the SitemapContentPagePWA rule implementation, because the URLs have not changed from the previous PWA versions to PWA3.

In case the sitemap should also include static pages, please be aware that currently only public available content pages can be selected in the sitemap content selection. See paragraph Configuration - Content Pages to be Included in Sitemap for further details.

Disclaimer
The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.
The Intershop Knowledge Portal uses only technically necessary cookies. We do not track visitors or have visitors tracked by 3rd parties. Please find further information on privacy in the Intershop Privacy Policy and Legal Notice.
Home
Knowledge Base
Product Releases
Log on to continue
This Knowledge Base document is reserved for registered customers.
Log on with your Intershop Entra ID to continue.
Write an email to supportadmin@intershop.de if you experience login issues,
or if you want to register as customer.