This document describes the concept behind the XML sitemap generation in Intershop 7. The XML sitemap feature allows to include links to categories, images and certain static pages. An XML sitemap providing HTTPS links is also available.
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a sitemap is an XML file that lists URLs for a site along with additional metadata about each URL. This allows search engines to crawl the site in a more intelligent way.
Web crawlers usually discover pages from links within the site and from links from other sites. Sitemaps supplement this data so that crawlers that support sitemaps can capture all URLs in the sitemap and learn about those URLs from the associated metadata. Using the sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling the site.
Each URL listed in these sitemap files can be crawled and is available together with some additional meta information. This additional information includes:
The date of last modification of the file
How frequently the page is likely to change
The priority of this URL relative to other URLs on the site
When multiple sitemap files are provided, each one of them should be listed in a sitemap index file. More than one sitemap index file may be provided but in the context of Intershop it is assumed that a single index file would meet all practical needs in the future. The format of a sitemap index file is very similar to the format of a sitemap file.
Once a sitemap file has been created and placed on the webserver, the search engines should be informed about it. This can be done by:
Submitting the sitemap file to the search engine via the search engine's submission interface,
Specifying the location in the site's robots.txt file,
Sending an HTTP request with the sitemap URL.
The search engines can then retrieve the sitemap files and make the URLs they contain available to their crawlers.
A website can provide multiple Sitemap files, but a sitemap file must not contain more than 50,000 URLs and must not be larger than 10 MB (10,485,760 bytes). The files can be compressed using gzip. However, the unzipped sitemap file must not be larger than 10 MB. If more than 50,000 URLs are listed, multiple sitemap files must be created.
The location of a sitemap file determines the set of URLs that can be included. A sitemap file located at http://example.com/catalog/sitemap.xml
can include any URLs starting with http://example.com/catalog
but cannot include URLs starting with http://example.com/images/
.
Image sitemaps are a Google-specific extension for standard sitemaps. They can be used under a second XML namespace image
within the sitemap and are used to power the Google image search.
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://example.com/ebook-reader.html</loc> <lastmod>2009-05-06</lastmod> <changefreq>weekly</changefreq> <priority>0.1</priority> </url> </urlset>
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://example.com/sitemap1.xml</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://example.com/sitemap2.xml</loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"> <url> <loc>http://example.com/sample.html</loc> <image:image> <image:loc>http://example.com/image.jpg</image:loc> </image:image> <image:image> <image:loc>http://example.com/photo.jpg</image:loc> </image:image> </url> </urlset>
Official site of the protocol: http://www.sitemaps.org
Licensing information: Attribution-ShareAlike Creative Commons License
Image sitemaps: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=178636
XML sitemaps are implemented as a product data feed. Product data feeds are an old concept. Beside XML sitemaps there are few other product data feeds that are implemented (e.g., Feed Dynamix feeds, Sitemap XML feeds, RSS feeds, etc). Intershop 7 allows to include product URLs, categories, product images, and static pages in the XML sitemap feeds.
In order to create a product data feed, a "syndication" should be created first. A syndication is a persistent object that has few properties and maintains a relation to a job configuration. Running a syndication (or running a product data feed) results in running the job linked with the syndication. Thus, each product data feed has its own job. The job pipeline that generates the XML sitemap is named ProcessProductSiteMap.
The syndication of persistent object plays another role. Any configuration for its job is saved as custom attribute of the syndication. The job can access the syndication and read out the configuration properties. The Intershop suite includes a small framework and a back office wizard for managing syndications.
Each syndication is managed with a uniform, tabbed UI. Any syndication-specific settings (e.g., those for the XML sitemap) are shown in the tab named Target. Each product data feed provides a custom view pipeline for rendering this tab. The pipeline that renders the XML sitemap Target tab is named ViewChannelOutboundSyndicationSitemapConfiguration.
The Commerce Management application allows the end user to create and configure XML sitemap data feeds.
The form can be utilized to set the type of product data feed to Sitemap XML (HTTP) or Sitemap XML (HTTPS) via the select box. Both provide either links of the one or other type.
The Target tab provides the following means to configure:
File name for the generated sitemap
The locale for the URLs
The currency for the URLs
The gzip compression level (or 0 if uncompressed)
An option to ping search engines after the sitemap generation is over
Whether products should be included
Which product images should be included together with the products if selected
Whether categories should be included
Which static pages should be included
The change frequency, priority and last modified date settings for the sitemap (see the sitemap protocol for more details)
In ICM 7.10.38 a new option in the compression level dropdown was introduced.
A sitemap with its content being in xml format is assumed to have a better SEO ranking than a smaller zip-archive (.xml.gz) containing said xml files.
The compression settings from 0 to 9 work as before and produce files like: SitemapXML-product-0.xml.gz
The compression setting: None (XML) now allows to produce content files like: SitemapXML-staticpage-0.xml
To automatically run a product data feed, the Scheduling tab provides a measure of settings. The default scheduling is set to Manually whereby the feeds can be triggered in the product data feed listing page. Run Once and Recurring Interval allow to run feeds automatically at a given time.
Low-level configurations for the sitemap generation feature are available in the syndication-targets.properties (cluster configuration directory). All relevant settings are prefixed with intershop.syndication.target.Sitemaps
. They are briefly described below by using their suffixes.
Suffix | Description | Example |
---|---|---|
| The job pipeline that creates the sitemap. | ProcessProductSiteMap |
| The view pipeline that renders the Target tab. | ViewChannelOutboundSyndicationSitemapConfiguration |
| The name under which the sitemaps are shown in the back office. | Sitemap XML |
| Localized names for the given sitemap. | Sitemap XML |
| The class that marshals the sitemap index file and manages the creation of the sitemap files. | com.intershop.component.marketing.capi.syndication.SitemapXMLMarshaller |
| Defines which protocol is used for the rendered links | either HTTP or HTTPS |
| The pipeline used to download sitemaps. | ViewSiteMapXML-Start |
| Comma-separated names of all object types that will be exported. For any of these there are certain properties prefixed with | Product,CatalogCategory,StaticPage |
| The class that represents the object to be exported. | com.intershop.beehive.xcs.capi.product.Product |
| A class that is used to compose the URL for the object to be exported. Subclass of | com.intershop.component.mvc.internal.sitemap.SitemapProductXMLCompositionAdapter |
| The view pipeline used to show the exported object. | ViewProduct-Start |
| The pattern for the sitemap file name. This file name will be included in the sitemap index file. | product |
| A comma-separated list with all search engines that should be pinged after the sitemap | Google,Bing,Yahoo,Ask |
| The URL of the search engine to be pinged. The URL of the newly generated sitemap is included in curly brackets. | [http://www.google.com/webmasters/sitemaps/ping?sitemap={0} |
| The number of ping retries if the first ping was unsuccessful. | 3 |
| The directory of the created sitemap file. | ${SYNDICATION_DIR}/sitemaps/${SYNDICATION_ID} |
Starting with ICM 7.10.26 it is possible to generate a sitemap containing Intershop Progressive Web App (PWA) compatible links.
The implementation is mainly done with four PWA-specific URL rewrite rules and some changes in the sitemap code. These rewrite rules create PWA URLs that match the standard PWA routes for product, category and content pages. Project-specific adaptions of the PWA routes must be reflected in the rewrite rules as well. For more information regarding the PWA related rewrite rules, please refer to Concept - URL Rewriting.
A new syndication-target configuration for PWA called Sitemaps-PWA has been created in the syndication-targets.properties. All keys in this file start with intershop.syndication.target.Sitemaps-PWA.
It can be selected as Type below Target tab when creating a new Product Data Feed in the channel back-office:
The Application dropdown allows to select an application. The list includes applications of the application type intershop.REST, which is currently used for the PWA.
If no application is selected, a fallback to the default application for the PWA with the URL identifier rest is used.
If an application is available, the hostname of the URLs generated for the PWA sitemap is based on its configuration. For each PWA rewrite rule, an optional configuration pwaHost replaces this value.
If neither is available, the intershop.WebServerSecureURL
property is used. For details on how to modify the WebServerSecureURL
property value, refer to Cookbook - Sitemap | 8 Recipe: Configure Domain-Specific Host Names for XML Sitemap URLs.
As described in Cookbook - Sitemap | 8 Recipe: Configure Domain-Specific Host Names for XML Sitemap URLs an application-specific WebServerSecureURL might be required.
Therefore:
Create a file named url.properties in the deployed server’s folder: server\share\system\config\apps\intershop.REST
Add configurations similar to this - url.properties - for the REST application:
intershop.WebServerURL=http://intershoppwa.azurewebsites.net:80 intershop.WebServerSecureURL=https://intershoppwa.azurewebsites.net:443
This is an overview of the new configurations specifically used for the PWA sitemap:
As mentioned before the syndication-target is Sitemaps-PWA.
The commonly used (not just for PWA) configurations are described above.
Low-level configurations for the sitemap-generating feature are available in the syndication-targets.properties (cluster configuration directory).
All relevant settings are prefixed with intershop.syndication.target.Sitemaps-PWA.
Suffix | Description | Example |
---|---|---|
| If this optional parameter is 'true', the configuration is treated as a PWA sitemap. It triggers necessary UI options, like the Application select box. Default value is 'false'! | true |
| This parameter filters/limits the list of applications selectable in the UI. (See: New Application configuration). If not defined, all applications of the channel are listed. | intershop.REST |
The sitemapPipeline
and viewingPipeline
configurations are: ViewSiteMapXMLforPWA, ViewProductPWA, ViewStandardCatalogPWA and ViewContentPWA. Neither pipeline actually exists.
This pipeline name configuration is only used as unique identifier for the URL rewrite rules. Since these links are specifically created for the PWA, they do not work in the inSPIRED demo store.
A domain splitting configuration in the domainsplitting.xml file has to exist to shorten the first part of the URL's used for the sitemap files.
Otherwise the URL's might not work for the PWA.
Example: https://intershoppwa.azurewebsites.net/WFS/inTRONICS/en_US/rest/USD/ vs. https://intershoppwa.azurewebsites.net/
<domainsplitting name="main host for rest-app - PWA"> <hosts> <host>intershoppwa.azurewebsites.net</host> </hosts> <site>inSPIRED-inTRONICS-Site</site> <shortpathpattern>${path}</shortpathpattern> <server-group>WFS</server-group> <!-- <currency>USD</currency> --> <appurlid>rest</appurlid> <!-- <locale>en_US</locale> --> </domainsplitting>
<domainsplitting name="main host for en_US and USD"> <hosts> <host>intershoppwa.azurewebsites.net</host> </hosts> <site>inSPIRED-inTRONICS-Site</site> <shortpathpattern>/${locale:(de|us)}${path}</shortpathpattern> <server-group>WFS</server-group> <currency>USD</currency> <appurlid>rest</appurlid> <replacements> <replacement type="locale"> <compact>de</compact> <expand>de_DE</expand> </replacement> <replacement type="locale"> <compact>us</compact> <expand>en_US</expand> </replacement> </replacements> </domainsplitting>
Configure the multi-channel:
nginx: ... environment: ... MULTI_CHANNEL: | .+: - baseHref: /us channel: inSPIRED-inTRONICS-Site lang: en_US
This fix is available from ICM versions: 7.10.26-LTS, 7.10.32-LTS, 7.10.37
The PWA can not access generated sitemap files.
Example sitemap location: share/sites/inSPIRED-inTRONICS-Site/units/inSPIRED-inTRONICS/syndication/sitemaps/sitemap_pwa/
To solve this problem the generated sitemap files must be copied to a more convenient location, where the PWA has access to:
Location (generic): share/sites/{channel}/1/static/{language}/sitemaps/pwa/
Example sitemap file location: share/sites/inSPIRED-inTRONICS-Site/1/static/en_US/sitemaps/pwa/sitemap_pwa.xml
The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.