Document Properties
Kbid23D962
Last Modified04-Feb-2020
Added to KB28-Feb-2013
Public AccessEveryone
StatusOnline
Doc TypeGuidelines, Concepts & Cookbooks
Product
  • ICM 7.6
  • ICM 7.7
  • ICM 7.8
  • ICM 7.9
  • ICM 7.10

Concept - XML Sitemaps

1 Introduction

This document describes the concept behind the XML sitemap generation in Intershop 7. The XML sitemap feature exists in earlier versions of Intershop/Enfinity. Currently, the feature has been extended to include links to categories, images and certain static pages. An XML Sitemap providing HTTPS links is also available.

2 Sitemap Protocol

2.1 Overview

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a sitemap is an XML file that lists URLs for a site along with additional metadata about each URL. This way, search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from links from other sites. Sitemaps supplement these data to allow crawlers that support sitemaps to pick up all URLs in the sitemap and learn about those URLs using the associated metadata. Using the sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling the site.

Each URL listed in these sitemap files can be crawled and is available together with some additional meta information. This additional information includes:

  • The date of last modification of the file
  • How frequently the page is likely to change
  • The priority of this URL relative to other URLs on the site

When multiple sitemap files are provided, each one of them should be listed in a sitemap index file. More than one sitemap index file may be provided but in the context of Intershop it is assumed that a single index file would meet all practical needs in the future. The format of a sitemap index file is very similar to the format of a sitemap file.

2.2 Informing the Search Engines

Once a sitemap file has been created and placed on the webserver, the search engines should be informed about it. This can be done by:

  • Submitting the sitemap file to the search engine via the search engine's submission interface,
  • Specifying the location in the site's robots.txt file,
  • Sending an HTTP request with the sitemap URL.

The search engines can then retrieve the sitemap files and make the URLs within available to their crawlers.

2.3 Protocol Limitations

A site may provide multiple sitemap files, but each sitemap file must contain no more than 50,000 URLs and must be not larger than 10 MB (10,485,760 bytes). The files might be compressed using gzip. However, the sitemap file once uncompressed must be not larger than 10 MB. If more than 50,000 URLs are listed, multiple sitemap files must be created.

The location of a sitemap file determines the set of URLs that can be included. A sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog but cannot include URLs starting with http://example.com/images/.

2.4 Image Sitemaps

Image sitemaps are a Google-specific extension for standard sitemaps. They can be used under a second XML namespace image within the sitemap and are used to power the Google image search.

2.5 Examples

2.5.1 Simple Sitemap File with One URL

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://example.com/ebook-reader.html</loc>
      <lastmod>2009-05-06</lastmod>
      <changefreq>weekly</changefreq>
      <priority>0.1</priority>
   </url>
</urlset>

2.5.2 Sitemap Index File With Two Sitemap References

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://example.com/sitemap1.xml</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://example.com/sitemap2.xml</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
</sitemapindex>

2.5.3 Sitemap File with Google-Specific Image Extension

<?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
 <url>
   <loc>http://example.com/sample.html</loc>
   <image:image>
     <image:loc>http://example.com/image.jpg</image:loc>
   </image:image>
   <image:image>
     <image:loc>http://example.com/photo.jpg</image:loc>
   </image:image>
 </url>
</urlset>

2.6 References

3 XML Sitemaps and Intershop

3.1 Overview

XML sitemaps are implemented as a product data feed. Product data feeds are an old concept. Beside XML sitemaps there are few other product data feeds that are implemented (e.g., Feed Dynamix feeds, Sitemap XML feeds, RSS feeds, etc). Up to Enfinity 6.4 only product URLs could be included into the XML sitemap feeds. In Intershop 7 categories, product images and static pages are also available.

3.2 Syndications

In order to create a product data feed, a so-called "syndication" should be created first. A syndication is a persistent object that has few properties and maintains a relation to a job configuration. Running a syndication (or running a product data feed) results in running the job linked with the syndication. Thus, each product data feed has its own job. The job pipeline that generates the XML sitemap is named ProcessProductSiteMap.

The syndication of persistent object plays another role. Any configuration for its job is saved as custom attribute of the syndication. The job can access the syndication and read out the configuration properties. The Intershop suite includes a small framework and a backoffice wizard for managing syndications.

Each syndication is managed with a uniform, tabbed UI. Any syndication-specific settings (e.g., those for the XML sitemap) are shown in the tab named Target. Each product data feed provides a custom view pipeline for rendering this tab. The pipeline that renders the XML sitemap Target tab is named ViewChannelOutboundSyndicationSitemapConfiguration.

3.3 Commerce Management Application UI

The Commerce Management application provides the end user with the ability to create and configure XML sitemap data feeds.

3.3.1 Configure Target

The wizard can be utilized to set the type of product data feed to Sitemap XML (HTTP) or Sitemap XML (HTTPS). Both provide either links of the one or other type.

The Target tab provides the following means to configure:

  • File name for the generated sitemap
  • The locale for the URLs
  • The currency for the URLs
  • The gzip compression level (or 0 if uncompressed)
  • An option to ping search engines after the sitemap generation is over
  • Whether products should be included
  • Which product images should be included together with the products if selected
  • Whether categories should be included
  • Which static pages should be included
  • The change frequency, priority and last modified date settings for the sitemap (see the sitemap protocol for more details)

3.3.2 Scheduling Configuration

To automatically run a product data feed the Scheduling tab provides a measure of settings. The default scheduling is set to Manually whereby the feeds can be triggered in the product data feed listing page. Run Once and Recurring Interval allow to run feeds automatically at a given time.

3.4 Configuration Files

Low-level configurations for the sitemap-generating feature are available in the syndication-targets.properties (cluster configuration directory). All relevant settings are prefixed with intershop.syndication.target.Sitemaps. They are briefly described below by using their suffixes.

Suffix

Description

Example

processPipeline

The job pipeline that creates the sitemap.

ProcessProductSiteMap

configPipeline

The view pipeline that renders the Target tab.

ViewChannelOutboundSyndicationSitemapConfiguration

displayName

The name under which the sitemaps are shown in the back office.

Sitemap XML

displayName.de_DELocalized names for the given sitemap.Sitemap XML

marshaller

The class that marshals the sitemap index file and manages the creation of the sitemap files.

com.intershop.component.marketing.capi.syndication.SitemapXMLMarshaller

protocolDefines which protocol is used for the rendered linkseither http or https

sitemapPipeline

The pipeline used to download sitemaps.

ViewSiteMapXML-Start

objecttypes

Comma-separated names of all object types that will be exported. For any of these there are certain properties prefixed with objecttype.

Product,CatalogCategory,StaticPage

objecttype.<objectName>.class

The class that represents the object to be exported.

com.intershop.beehive.xcs.capi.product.Product

objecttype.<objectName>.xmlCompositionAdapter

A class that is used to compose the URL for the object to be exported. Subclass of SitemapObjectXMLCompositionAdapter.

com.intershop.component.mvc.internal.sitemap.SitemapProductXMLCompositionAdapter

objecttype.<objectName>.viewingPipeline

The view pipeline used to show the exported object.

ViewProduct-Start

objecttype.<objectName>.filePattern

The pattern for the sitemap file name. This file name will be included in the sitemap index file.

product

searchEngines

A comma-separated list with all search engines that should be pinged after the sitemap creation.PageletEntryPointIDs

Google,Bing,Yahoo,Ask

searchEngine.<searchenginename>.URL

The URL of the search engine to be pinged. The URL of the newly generated sitemap is included in curly brackets.

[http://www.google.com/webmasters/sitemaps/ping?sitemap={0}

searchEngine.<searchenginename>.RetryCount

The number of ping retries if the first ping was unsuccessful.

3

exportDirectoryThe directory of the created sitemap file.${SYNDICATION_DIR}/sitemaps/${SYNDICATION_ID}

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Customer Support
Knowledge Base
Product Resources
Support Tickets