Personal Library

Concept - Page Cache

Introduction

Usually, a web store contains multiple pages like, e.g., a home page, category pages, product pages and various other pages. Generating such a page for every request may become very expensive. So why not cache the generated result and deliver the response for a request out of a cache instead of generating it by the application server? Here the page cache comes into play. The following content will give you an overview of how the page cache works and how to use it.

References

Cookbook - Page Cache

Infrastructure

Client

The client is most likely a browser which requests data from the web server. We can differentiate between static and dynamic requests. Images, CSS files, Java script files are static content; requests to pipelines are dynamic content.

Web Server / Web Adapter / Page Cache

The web server is responsible for handling client requests. Part of the web server is the web adapter. It is the connection between the web server and the application server and provides the caching functionality. Each entry in the page cache is stored in a separate file. In addition to the generated output of the application server those files contain meta data.

Content of a page cache file

X-IS-CACHEURI: live#1#0#127.0.0.1:81#383#/WFS/PrimeTech-PrimeTechSpecials-Site/en_US/-/USD/Render-StartExternal;pgid=9YpOS7SHDyhSRpgoyNJlB2Ab0000?PageletUUID=Vr8KAM6PSN8AAAE5XUE4kG3R&CategoryBO=rO0ABXNrRzNP&ProductBO=rO0ABXNyAEVjb20
X-IS-KEYWORDS: com.intershop.component.product.internal.ORMProductBOImpl;EjMKAM6P35kAAAE5zy44kG3R;2958189;com.intershop.beehive.xcs.capi.product.Product;
X-IS-SIDLENGTH: 40
X-IS-SPGIDLENGTH: 36
X-IS-LASTMODIFICATION: 1353400621
X-IS-WAINCLUDES: 0
X-IS-PERSONALIZED: 1
Content-Type: text/html;charset=utf-8
Content-Length: 3870

Here comes the HTML output or your JSON data. It can even be XML but in this case we should switch the content type. Note that the content length in this example is not correct.

Application Server

The application server handles requests of the web adapter and delivers requested content.

Cacheable Items

We can cache two different kinds of content:

static content - images, JavaScript, CSS
There is only one way to control the expiration behavior for static content. For each domain we can define how long every static content file is cacheable. A differentiation between different content files is not possible. Invalidation is only possible by invalidating the whole page cache.
dynamic content - the output of pipelines
The expiration behavior of dynamic content is configurable per item. Invalidation can be done in multiple ways.

How Does It Work?

The basics are pretty simple. Let's look at the web adapter and ignore the web server. If a new request arrives, the web adapter tries to get the content for the response out of the page cache. If there is no entry or only an outdated entry in the cache, the web adapter requests the necessary data from the application server and delivers it to the client. In addition, it adds the new content to the page cache so it is available for the next request. If the same URL is requested again, the web adapter is able to retrieve the content directly from the cache without querying the application server.

If you have a page which is assembled using isinclude, the flow diagram will look a bit different. isinclude can be used to do template includes which are handled by the application server. If you use url includes, the web adapter is responsible to resolve those includes using new requests. See the following flow for example. Assume we have a page which includes exactly one url include. The include is not cacheable.

A page intended to be cached can contain keywords which are associated to it. The keywords are stored by the WebAdapterAgent in a special page cache index. The WebAdapter can search and find cached pages associated with a keyword. This allows to invalidate the page cache partially, based on keywords.

Note

Note that the WebAdapterAgent indexes the complete content of the page. This indexing allows to invalidate the page cache not only by explicit keywords but also by any kind of text contained in pages. However, when the page cache is invalidated by processes like import, staging or object replication, only explicit keywords are used.

Session Handling

We use session IDs (sid) to track a session during subsequent requests. Once we would cache session IDs, we can no longer track different sessions. Instead, session IDs are replaced by placeholders in the page cache. If the web adapter retrieves a page from the page cache, it will replace this placeholder with the current requests session ID. This way we combine caching and session tracking.

User Group-Specific Caching

There are two different ways to cache a page: personalized and not personalized. Caching personalized means that in addition to the plain url a pgid, the personalization group ID, is used to cache pages and retrieve pages from cache. The pgid is generated by the application server and is stored on the client side either as URL parameter or as a cookie.

Invalidation

By default, an entry in the page cache is invalid if its time stamp is older than the current time stamp. If a new request comes in and the webadapter detects an expired entry, a new request is sent to the application server. If there are multiple requests for an expired entry, we open only one request to the application server. All the other requests get the expired entry until the single application server request returns a new entry.

There are two ways to invalidate the page cache:
1) The first is to invalidate the complete page cache. Instead of deleting or touching all entries in the page cache, we generate a new page cache ID and read/store entries from/in a new physical directory. In this case the expiration behavior described above does not work. Therefore, this type of invalidation has a huge performance impact. However, there is a way to prefetch the page cache which is described below.

2) The second approach is using keywords to invalidate only specific pages. Invalidation of the whole page cache always leads to decreased performance until the invalidated pages are cached again, so the selective page caching is most likely less harmful. In addition, the selective cache clearing sets the time stamp of page cache entries to somewhere in the past. Contrary to the expiration behavior described above, an asynchronous thread is started that removes the pages from the file system as well as the entries from the index within a few seconds.

Note that after mass data replication the first approach - complete invalidation by using a fresh page cache ID and directory - is used. This allows to rollback to the old page cache ID and directory if a replication process is undone.

In addition, there is a way to perform invalidation by Java code, see Cookbook - Cache Management.

Invalidation Delay

If the page cache is invalidated on the Application Server, the invalidation on the WebAdapter / WebAdapterAgent does not happen immediately.
The WebAdapterAgent gets the information for page cache invalidation from the Application Server by calling the Page Cache Servlet which delivers the necessary info like keywords, etc. It is important to know that this communication takes place at a fixed interval only. This interval can be changed in the webadapter.properties file via property webadapterAgent.pageCache.pageClearInterval. It is usually configured to 60 seconds. This is one reason for the delay which can be observed when invalidation is initiated. Additionally, the page cache index is updated before invalidation is done by the WebAdapterAgent; this can also take some time.

Page Cache Prefetching

If the page cache is empty, e.g., after a mass data replication process, the load on the appservers will increase. This risk can be reduced by using the page cache prefetch mechanism provided by the WebAdapterAgent. The prefetch process is implemented as a crawler in the WebAdapterAgent and should be executed in a timeframe where traffic is expected to be low, e.g., after a mass data replication which is executed during the night.

1.1 The WebAdapterAgent calls the ConfigurationServlet on the Application Server to determine the crawler configuration, e.g. a start URL for crawling, the link depth, the timeframe crawling is allowed, etc.
1.2 The Application Server delivers the config as response via the ConfigurationServlet to the WebAdapterAgent.
1.3 The WebAdapterAgent issues a request for the start URL to the Web Server; it basically acts like a client browser without cookies. All crawling is done anonymously.
1.4 The Web Adapter forwards the request to the Application Server - given that there is no page cache entry for the request already.
1.5 The Application Server processes the request and delivers a response to the Web Adapter.
1.6 If the response is cacheable, the Web Adapter creates a page cache entry for it.
1.7 The WebAdapterAgent - remember, it acts like a client browser - parses the response for further links to other pages as next URL.
1.x The WebAdapterAgent issues a request for the next URL, continue with 1.4.

Note that crawling can be configured in terms of

threads,
maximum duration,
link depth,
timeframes where crawling is allowed,
maximum number of requests per minute,
and
crawling can be disabled completely for a particular WebAdapterAgent instance.

Configuration

SMC

For each channel, there are configuration options for page caching. It is possible to define the caching time for static content. Page caching for dynamic content can be enabled and disabled. To be able to invalidate the page selectively, "Full text indexing" must be enabled. When the page cache should be invalidated by processes like import, staging or object replication based on keywords, "Explicit keyword processing" must be enabled as well. Note that enabling "Explicit keyword processing" and disabling "Full text indexing" will disable selective page cache invalidation.

Enfinity Suite 6 System Management Console | Site Management | Page Cache:

Intershop 7.4 System Management | Site Management | Page Cache:

Intershop Commerce Management B2C 7.6 System Management | Site Management | Page Cache:

The page cache prefetching process for a site can be started and stopped by using the Prefetch Cache and Stop Prefetching buttons.

Property Files

webadapter/webadapter.properties
share/system/config/cluster/webadapter.properties
share/system/config/cluster/crawler_defaults.properties - contains the general configuration for page cache prefetching which applies to all sites
share/system/config/cluster/crawler_<site_name>.properties - contains the site-specific configuration for page cache prefetching which applies to <site_name>.

ISML Tags

`ISCACHE` Tag

If you want your ISML template to be cached, you will have to use the ISCACHE tag. There are three ways to declare the caching behavior of a template.

<ISCACHE type="daily" hour="23" minute="30">

Using the type daily will invalidate the cache at a specific time.

<ISCACHE type="relative" hour="1" minute="30">

Using the relative type will lead to invalidation of the cached result after the specified time. In this example the result is cached for 1.5 hours.

<ISCACHE type="forbidden">

ISML templates for printing content that must not be cached can be marked with the forbidden cache type. With this it is ensured that also with complex ISML include structures no non-cacheable content is stored in the page cache.

The appserver also produces warning log messages in case of caching declaration inconsistencies, which means that before or after an ISCACHE "forbidden" a normal caching declaration is used. Those warnings are also produced if caching for the site is disabled (which is often the case during development time).

Usage: <ISCACHE type="forbidden">

Additionally, if multiple ISCACHE tags of type daily or relative exist, the one with the shortest caching period wins instead of the last as in former versions. For those cases, debug logging messages are produced.

`ISCACHEKEY` Tag

Cache keys are used for selective page cache deletion. Use the ISCACHEKEY tag to provide keywords or objects which you can use to invalidate page cache entries. The difference between the parameter object and keyword is that the object is translated into keywords using a provider which implements com.intershop.beehive.core.capi.pagecache.PageCacheKeywordsProvider. Using this technique we are able to create the necessary keywords for an object without adding multiple iscachekey tags in every ISML template. In addition, we can make sure that the keywords for an object are consistent since there is exactly one place which is responsible for creating them.

<ISCACHEKEY object="#ProductBO#">

<ISCACHEKEY keyword ="12345">

Another way to support selective page cache deletion is to use full text indexing which does not need any kind of additional markup.

`<ISCONTENT personalized="true">` Tag

The response of the application server is usually cached with the request's URL as key. If we want to generate different but cacheable output based on the same URL and some other information, we need to encode the information which is not part of the URL into the PGID (personalization group ID). The user group assignments of the current user is an example of such data. This PGID has to be part of the key with which we retrieve an entry from the page cache.

To enable personalized caching you will have to mark the content of your ISML template as personalized to the web adapter can take this information into account while doing all the caching. This can be done using the iscontent tag.

<iscontent personalized="true" type="text/html" charset="UTF-8" compact="true">

By default, the parameter personalized is false. If you use multiple levels of template includes and mark just one template as personalized, the whole response will be cached as personalized.

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Table of Contents