This guide outlines configuration and administration options with respect to the Intershop Commerce Management Web Adapter. This document is addressed to system administrators or DevOps who configure and maintain Intershop Commerce Management instances.
Info
Prior to Intershop version 7.7 the information provided in this document were part of the Administration and Configuration Guide that can be found in the Knowledge Base.
The Intershop Commerce Management Web Adapter (WA) is a Web application that distributes client requests from public Web servers to the Intershop Commerce Management application servers, and implements a specialized HTTP proxy cache for application server responses. The Web Adapter is implemented as a plug-in for the Apache HTTP Server.
Note
All relevant setup options are to be configured in advance via dedicated deployment script files, before actually executing the deployment. So be aware that if you modify the Intershop Commerce Management configuration after it is deployed, all changes will be overridden with the settings specified for your deployment.
Concept | Description |
---|---|
Web Adapter | The Web Adapter is a plug-in to the Apache HTTP Server, which works as a reverse proxy and is responsible for:
|
You should be familiar with the main concepts of the Intershop Commerce Management infrastructure. Refer to Overview - Infrastructure, Scaling and Performance.
The Web Adapter is configured using two different configuration files:
The location of the local webadapter.properties file can be customized. To specify a custom location, modify the WebAdapterProperties
directive within the httpd.conf file (Windows: <IS.INSTANCE.LOCAL>/httpd/conf/,Linux: /etc/opt/intershop/eserver#/httpd/conf/) of the Web Server. Changes to the property files are effective immediately without the need for a restart of the application server or the web server.
Central WA configuration data, such as timeout settings, servlet names and dynamic registration information about available server groups (see Server Group Definition), sites and application servers, is provided by configuration servlets. Every application server runs a configuration servlet. In order to query the required information, the Web Adapter periodically performs HTTP calls to an available configuration servlet.
The local webadapter.properties file in <IS.INSTANCE.LOCAL>/webadapter/ must list all configuration servlets in the cluster. The Web Adapter cannot process requests without at least one responsive configuration servlet and a valid property file. Contact with one configuration servlet is sufficient because every configuration servlet is capable of providing the necessary configuration data about the entire cluster. Fail-over functionality, however, is only available if the local webadapter.properties file lists all the configuration servlets available in the cluster. If one configuration servlet fails, the Web Adapter tries all other configuration servlets to find a responsive configuration servlet.
The cs.url.<n>
properties in the webadapter.properties file specify the access URLs of each application server's configuration servlet, for example:
cs.url.0=http://<host1>:10054/servlet/ConfigurationServlet cs.url.1=http://<host2>:10054/servlet/ConfigurationServlet
or
cs.url.0=http://<host1>:10054/servlet/ConfigurationServlet cs.url.1=http://<host1>:10064/servlet/ConfigurationServlet
In the local webadapter.properties file you can set the path and specify the file name of the Web Adapter's request log and error log. Logging behavior can be controlled using the central or the local webadapter.properties file.
The request log contains information
The content of the request log can be processed by other tools. The log directory can be configured via the local webadapter.properties file. The installation default is
#requestlog.dir=/<IS.INSTANCE.LOCAL>/webadapter/log
If you want to supply a different path, change the setting as needed and remove the #
character to enable this property.
For efficiency, the log entries are not written separately with each request. Instead, they are buffered in memory until a configurable size or interval is exceeded. The default settings are:
requestlog.enabled = true requestlog.buffersize = 500000 requestlog.flushinterval = 10 requestlog.switchinterval = 86400
where buffersize
defines the size (in bytes) of the memory where log entries are buffered before they are written to disk, and flushinterval
specifies the time (in seconds) between flushes of the buffer to disk. The property switchinterval
specifies the interval (in seconds) when a new log file is created.
In addition, the following properties are available to control the request log behavior, in particular, to reduce the log file size:
Property | Default | Description |
---|---|---|
requestlog.includes | true | Controls the logging of <wainclude> requests. |
requestlog.sparse | true | Specifies whether redundant log entries for <wainclude> requests are skipped. |
requestlog.binary | true | Specifies whether responses with X-IS-BINARY:1 header are recorded. |
The statistical information contained in request log files is arranged in columns, separated by the character '|'. Each request entry terminates with the character '\n'. The column separator character '|' is replaced by '%7C' if it occurs in the logged data.
All request log lines produced by a single client request (nested <wainclude> requests, the client request itself and business events) are written in an uninterrupted sequence of lines. They cannot be mixed line-by-line with other requests or split by log rotation.
The following table describes the columns in detail:
Column | Description |
---|---|
1 | Request start time in milliseconds after 1970-01-01 |
2 | Remote IP number (content of the CGI variable REMOTE_ADDR) |
3 | Remote user (content of the CGI variable REMOTE_USER) |
4 | Server name (content of the CGI variable SERVER_NAME) |
5 | Server port (content of the CGI variable SERVER_PORT) |
6 | Runtime of the request in milliseconds, measured from the incoming request until the response is sent to the Web server API |
7 | Request path information followed by the (optional) query-string (<PATH_INFO>[?<QUERY_STRING>]) |
8 | User agent (content of the CGI variable USER_AGENT) |
9 | Cookie(s) that were sent along with the request (content of the HTTP header COOKIE) |
10 | Referrer of the requested page (content of the HTTP header REFERRER) |
11 | Session ID used for the request (may differ from the SID in PATH_INFO or Cookie if the Web Adapter has assigned a new one) |
12 | Request ID, built as <key>-<level>-<index>, where <key> is a unique identifier for the request and all its <wainclude>s created from host ID, process ID, time and counter <level> is the current <wainclude> recursion depth; starting with 0 for a client request; 1 or more decimal digits <index> is the serial counter to distinguish subsequent <wainclude>s at the same recursion level; starting with 0; 2 or more decimal digits |
13 | Source of the response, either: <host>:<port> – indicates that an application server response was received and delivered from this server pc – indicates that a pagecache response was delivered <host>:<port>:pc – indicates that an application server response was received and delivered from this server and that it was written into the page cache for later reuse |
14 | Response status; values < 100 indicate a "200 OK" application server response with the "X-Error" header value set; values >= 100 && < 1000 indicate the HTTP status code, if a "X-IS-HTTPResponseStatus" response header is present, its value wins over the actual HTTP status code; values >= 1000 indicate a handled processing error, hence a error message is also written to the webadapter.log, 1013 (RECV_ERROR) means no response received due to socket timeout etc., 1015 (DEPTH_ERROR) means a <wainclude> tag in the response cannot be resolved since "include.maxdepth" is already reached |
15 | Time to receive the request from the client (in milliseconds). Includes the time required to receive POST content and to parse the URL and HTTP headers. |
16 | Waiting time in the request's queue to get an application server thread or to find a valid cache file (in milliseconds) |
17 | Time (in milliseconds) to send a request to the application server and receive a valid response back. If the response is already present within the page cache, the time for opening and reading the cache file is logged. |
18 | Time to parse the response and resolve all contained includes (in milliseconds) |
19 | Time (in milliseconds) required for transferring the response to the Web server. Determined by the length of the response, the Web server buffer size to cache it, and the connection speed between the client browser and the Web server. |
20 | Request method, such as GET or POST |
21 | The PGID, which was internally used during request processing. The column will be empty if no personalization is involved in request and response.Note: This is not necessarily the PGID, which is present in the client request URL. In the case that the WA has thrown away an invalid SPGID or the AS has assigned a new PGID to the session, the new value wins. |
22 | PGID state, single letter code to keep track of personalization state changes: n -> new: no valid PGID in request; a new one is used in response URLs u ->unchanged: request PGID used unchanged in response URLs c -> changed: valid PGID in request; but a new one is used in response URLs r -> removed: valid PGID in request; none is used in response URLs empty: no personalization is involved in request and response. Note: c and r imply that an AS request is executed and that the response is not cached. n in conjunction with pc in column 13 indicates an "PGID distribution page" case. |
23 | personalized flag, 0|1 according to <iscontent personalized="false"|"true" >, and empty if no personalization is involved in request and response. |
24 | Single letter to tag the request type: p: pipeline request s: .servlet request |
25 | Response size as indicated in the HTTP headers. For includes the content-length of the received response is logged. For top-level requests, the content length of the aggregated, outgoing client response is logged, followed by a colon (":") and the content length of the internally received response. For <wainclude>s, the content length of the internally received response is logged. -1 if no response size is available with streamed responses or in error conditions. |
26 | For top-level requests, the number of bytes transmitted to the hosting web server API. -1 indicates an error. For <wainclude>s, always 0. |
27 | Secure flag, 0|1 - this flag is set to 1 if the request was detected as secure request, otherwise it is set to 0. |
28 | Robot flag, 0|1 - this flag is set to 1 if the request was detected as originating from a robot, otherwise it is set to 0. |
29 | Binary flag, 0|1 - this flag is set to 1 if the response has set the header X-IS-BINARY, otherwise it is set to 0. This facilitates page impression tracking, for example, when images are delivered via pipelines. |
30 | For top-level requests, the HTTP status code sent to the client with the outgoing response. This status code may be passed as is from the response, set via a "X-IS-HTTPResponseStatus" response header or set by the WA with its own error pages. For <wainclude>s, always empty. |
31 | Page name, the value of the "X-IS-PageName" HTTP response header (or empty). |
The request log upload behavior is controlled by the Web Adapter Agent. The corresponding settings are stored in the Web Adapter Agent section of the global webadater.properties file.
The following properties are available:
Property | Description |
---|---|
webadapterAgent.log.level | Controls the logging verbosity (DEBUG, INFO, WARN, ERROR, FATAL). Default: INFO. |
webadapterAgent.requestLog. upload.pathInfo | Specifies the target pipeline for uploading the access log files. The default value should not be changed: /BOS/root/-/-/-/WriteFileFromRequest-Start. |
webadapterAgent.requestLog. serverTest.pathInfo | Specifies the target pipeline for clearing the upload. The default value should not be changed: /BOS/root/-/-/-/WriteFileFromRequest-CheckServerAvailability. |
webadapterAgent.requestLog. chunkSize | Specifies the chunk size (in Byte) for log file uploads. Default: 2000000. |
webadapterAgent.requestLog. pushInterval | Specifies the time interval (in seconds) between upload operations. Default: 10. |
The error log will be written by the Web Adapter to indicate error situations. The name of this file is configurable in the local webadapter.properties file. The default is
#errorlog.file=/<IS.INSTANCE.LOCAL>/webadapter/log/webadapter.log
If you want to supply a different path, change the setting as needed and remove the #
character to enable this property.
In addition, errorlog.file
can be configured with strftime() conversion specifiers for time-based rotation, like .../webadapter-%Y-%m-%d.log
.
The overall log level is configured using the property errorlog.level. The value can be one of OFF
, FATAL
, ERROR
, WARN
, INFO
, VERBOSE
or DEBUG
. The default value is
#errorlog.level = INFO
Note
VERBOSE
or DEBUG
log levels with production systems.Besides overriding a log level configuration in the global webadapter.properties file with a configuration in a local webadapter.properties file, it is also possible to override the setting of errorlog.level with specific settings for individual log scopes. This makes it possible, for example, to obtain detailed log messages for certain functional domains of the Web Adapter, without having to deal with the same log detail for other domains which are of less interest for a certain purpose.
For example, with settings as below, log messages of level ERROR are recorded for configuration service messages, and messages of level VERBOSE to trace external HTTP traffic. For all other domains, log messages of level INFO are included with the error log file.
errorlog.level=INFO errorlog.level.config=ERROR errorlog.level.httpexternal=VERBOSE
The following log scopes are available:
Log Scope | Description |
---|---|
errorlog.level.log | Messages of the log class. Only used to debug the logging mechanism itself |
errorlog.level.fileutils | Messages related to file handling operations, such as opening, closing or deleting files |
errorlog.level.utils | Messages related to parsing operations |
errorlog.level.socket | Messages related to socket I/O |
errorlog.level.response | Messages related to HTTP responses from the application server |
errorlog.level.webserver | Messages related to the Web Server API (e.g., queries for request header, post content, etc.) |
errorlog.level.httpexternal | Messages related to external responses as passed to the Web Server. This log scope can be used with log level VERBOSE |
errorlog.level.httpinternal | Traces communication with application server. This log scope can be used with log level VERBOSE |
errorlog.level.main | Messages related to loading the modules into the Web Server, including checks performed on startup |
errorlog.level.request | Messages related to general request processing tasks |
errorlog.level.properties | Messages related to reading global and local Web Adapter properties |
errorlog.level.session | Messages related to session ID and PGID handling |
errorlog.level.pagecache | Messages related to cache operations |
errorlog.level.postprocess | Messages related to postprocess operations |
errorlog.level.config | Traces communication with configuration service |
Each entry in the error log file consists of a single line with the following basic format:
<date> <time> (<hostname>) [<process-id>/<thread-id>/request-id]<log level> <text>\n
Note that the process-id
and thread-id
entries may differ according to the environment. These entries provide a unique identifier for a request handler within the machine. For the multi-processed Apache Web server, it is actually a combination of parent-process-id/process-id
; multi-threaded servers use process-id/thread-id
.
The request-id
is assigned to every incoming request. It is also used in the Web Adapter request log and in the application server error log.
The Web Adapter process writes the pagecache-<date>.log file if the keyword or page indexing for the site is enabled. The Web Adapter agent process reads the entries (operation write_time fileLocation) and indexes the corresponding files. This index is used for cache clear operations (by keyword or content).
The Web Adapter stores several public pages which are returned in case of general errors, e.g., to indicate URL errors or system overload. By default, error pages are stored in <IS.INSTANCE.LOCAL>/webadapter/public. Using the property errorpage.dir
of the local webadapter.properties file, it is possible to store error pages in a different location.
Note
errorpage.dir
, make sure to also adjust the Web Server directive for the waroot
alias, contained in the httpd.conf, which is used to resolve the path to images contained in Web Adapter error pages.Intershop Commerce Management supports site-specific error pages. To this end, the Web Adapter looks up error pages in the following sequence until it finds a valid page:
(1) <errorpage.dir>/<site>/<page>
(2) <errorpage.dir>/<hostname>/<page>
(3) <errorpage.dir>/<ip>/<page>
(4) <errorpage.dir>/<page>
where
Intershop recommends to create error pages in <errorpage.dir>/<site> with absolute path references to this site in the ISML templates, e.g.,
<img src="/waroot/<site>/site_error_image.gif">
Symbolic links from <host>/<ip> to <site> can be used to provide the correct error pages in case the site context is unknown but a dedicated host name exists.
In order to facilitate monitoring of activity within an Intershop Commerce Management cluster, the Web Adapter tracks statistical data and sends these data to a configurable pipeline in regular intervals.
With the default implementation, the Web Adapter statistics are sent to the pipeline WebadapterStatistics-Push, which is part of the monitor cartridge. It is possible to configure a different pipeline, in order to implement custom monitoring solutions.
The default pipeline (WebadapterStatistics-Push) writes the statistical data to a log file in the Intershop Shared Files. The log file content is optimized for easy parsing.
If not otherwise specified, the Web Adapter statistics is written into Intershop Commerce Management's default log directory <IS.INSTANCE.SHARE>/system/log. The log file names correspond to the following pattern:
<IS.INSTANCE.SHARE>/system/log/wastatistics-<instance-ID>-<date>.log
for example,
/eserver1/share/system/log/wastatistics-10.0.20.1-2010-03-25.log
To configure the Web Adapter statistics log, adjust the following properties:
In the global webadapter.properties, specify
monitor.pathinfo
Configures the pipeline to which the statistics data will be sent. The default configuration is:
monitor.pathinfo = /BOS/SMC/-/-/-/WebadapterStatistics-Push
monitor.pushinterval
Configures the time (in seconds) between transmissions of statistics data from the Web Adapter to the application server. The default value 0 disables the transmission of statistics data.
monitor.pushinterval = 0
monitor.propertyInterval
Configures the time (in seconds) between transmissions of the full set of Web Adapter properties to the application server. Additional transmissions (pushes) within this interval do not contain these properties. The default value is 600. If set to 0, the properties are sent only in case the Web Adapter properties have changed.
monitor.propertyInterval = 600
In the local Web Adapter properties file <IS.INSTANCE.LOCAL>/webadapter/webadapter.properties, specify the Web Adapter instance (<IS.WA.INSTANCE.ID>), for example:
monitor.instance.id = WA0
Setting this property allows to overwrite the value of <IS.WA.INSTANCE.ID>
that is sent within the statistics data. This may be necessary, for example, to distinguish different Web servers that share the same IP address.
Specify the following properties for the StatisticsWriterPipelet in the pipeline WebadapterStatistics:
LogDirectory
Defines the directory where the log file will be stored. Make sure that the specified directory exists. Default:
LogDirectory = <IS.INSTANCE.SHARE>/system/log
DatePattern
Defines the file name and the rollover of the log file. DatePattern
must match a pattern as defined in java.text.SimpleDateFormat. Default:
DatePattern = yyyy-MM-dd
Info
This section replaces the outdated Knowledge Base article with the ID 4289A and the title wastatistics and wastatus: Monitoring Intershop Application.
Intershop 7's Web Adapter provides a means of monitoring system information via HTTP calls. This section explains the basics of the so-called wastatistics and wastatus monitors.
The motivation behind wastatistics is to provide realtime insights into an Intershop system from the Web Adapter point of view. Therefore, internal Web Adapter state information is structured and put into a form suitable for human readers. The monitor page consists of sections which are closely related to the main Web Adapter functionalities. The following types of information are provided:
Note
Of the different pieces of information wastatistics provides, some have immediate relevance for the administration and maintenance of an Intershop application system. Others are displayed but are not of immediate significance and require interpretation.
For example, besides more or less static information about configuration parameters, one immediately usable information is that of the availability of application servers. Due to its design, wastatistics displays information only about those application servers that are currently registered at the master Control Server. By checking wastatistics you can easily determine whether all application servers are online and take steps when one or more of them are missing.
Note
Regardless of which Intershop Application is running, the monitor can be called with the following URL syntax:
http://localhost/INTERSHOP/wastatistics[?]
The argument is optional. It specifies a refresh interval for the "wastatistics" monitor page in seconds.
Be aware, however, that you must have added the necessary Web Server mapping for the extension /wastatistics before, see Guide - Checklist Going Live.
Contrary to wastatistics, which provides system information from the Web Adapter point of view, wastatus simply returns the status of the Web Adapter itself. Depending on its status, the Web Adapter responds either with 200 (OK) and a single-line HTML page Up or with 500 (Internal Server Error) and a single-line HTML page Down.
Note
The Down state is meant to indicate that this Web Adapter is not able to process any request at the moment. Recognized Down state conditions are:
Note
Regardless of which Intershop Application is running, the monitor can be called with the following URL syntax:
http://localhost/INTERSHOP/wastatus
Be aware, that you must have added the necessary Web Server mapping for the extension .wastatus before, see Guide - Checklist Going Live.
Intershop Commerce Management's Web Adapter features a control interface that allows to trace and to modify certain aspects of the Web Adapter's internal operation. It is realized as a set of optional, "magic" request parameters, which can be appended as path segment parameters to URLs to be finally handled by the Web Adapter.
To enable this tracing and control functionality, add the following line to either the global or the local webadapter.properties file:
ctl.commandPrefix = wa_
The command prefix wa_
can be modified to match your preferences. Intershop, however, recommends to use a prefix that is simple, short and unique enough. When enabled, the interface supports URLs like
http[s]://.../<url-path>[;<prefix><cmd>[=<args>][;...]][?<url-query>]
The following table lists the available commands.
Command | Description |
---|---|
<prefix>help[;<prefix><cmd>...] | Writes the general help or the first given command's help message, if any. |
<prefix>keep[;<prefix><cmd>...] | Stores the given commands in the requesting HTTP user-agent. These commands will apply to all subsequent requests without an URL command parameter. |
<prefix>source=<addr>:<port> | Overrides the automatic server selection or session/server affinity and uses the given application server. If applicable, generates a new session ID, which stores this server assignment. This command implies <prefix>flags=no-pc. Add <prefix>flags=pc to use the page cache in its normal way; add <prefix>flags=pc_write to update the cache from the given server's response. Responds with 503 Service Unavailable if the given server is not present at all or not assigned to the request's server group. |
<prefix>flags=<flag>[,<flag>...] | Skips or forces the selected features while processing the request.
|
<prefix>trace[=plain|comment|markup] | Writes a "request processing trace" instead of or into the outgoing response:
|
To restrict the control interface access for a defined IP range, for example, edit the Web server configuration accordingly. That is, in <IS.INSTANCE.LOCAL>/httpd/conf/extra/httpd-webadapter.conf, add something like
<LocationMatch ;wa_> Require ip 10.0.0.0/8 </LocationMatch>
Intershop recommends to restrict the access to the control interface for production systems.
Web based applications use session IDs (SIDs) to overcome the limitations of the HTTP. As a stateless protocol, HTTP needs a session ID to assign certain activities to a specific user.
Each user entering a site for the first time starts a new session and gets a unique session ID. From that point, this SID will be used in all subsequent requests. Depending on the lifetime value specified for a session, the IDs remain valid for a few minutes, hours, or even days.
A session's lifetime is configured by using two properties - session.ttl
in the configuration file $IS_SHARE/system/config/cluster/webadpater.properties, and intershop.session.TimeOut
in the configuration file $IS_SHARE/system/config/cluster/appserver.properties.
The property session.ttl
configures the number of seconds a session ID remains valid. The default value is "21600
", which means that a session ID becomes invalid after 6 hours and all session-related information is lost.
The intershop.session.TimeOut
property configures the number of minutes that the session object remains valid. The default value is "15
", which means that after 15 minutes without a request the session object is invalidated and all information contained in the session object (e.g. the basket) is lost, even if the session ID is still valid. Any new request will create a new session object. Please note that, contrary to the value specified for session.ttl
, which marks an absolute timespan, the value of intershop.session.TimeOut
is relative to the last request. So, if a new request with this session ID hits the application after 15 minutes of inactivity, the session object lifetime starts over and is 15 minutes again. If the intershop.session.TimeOut
property is set higher than session.ttl
, then session.ttl
must be increased accordingly.
The following picture visualizes the relationship between intershop.session.TimeOut
and session.ttl
:
You can define how the session tracking cookies are set in the client. To this end, the webadapter.properties file includes the two settings
session.SIDCookie[.<site>] = <Set-Cookie> session.PGIDCookie[.<site>] = <Set-Cookie>
where session.SIDCookie
defines the SID cookie generation, and session.PGIDCookie defines the PGID cookie generation. The default values are:
#session.SIDCookie = Set-Cookie: sid=%v; Path=/; Version=1 #session.PGIDCookie = Set-Cookie: pgid=%v; Path=/; Version=1
The placeholder %v
is to be used "as is" as the actual SID or PGID cookie value. Another placeholder, %s
, expands to the request's site name (or an empty string), which is useful in the PGID name definition.
For detailed information about the <Set-Cookie>
syntax, refer to RFC 2109 (http://www.ietf.org/rfc/rfc2109.txt).
The Web Adapter can be configured to render responses without session ID or personalization group ID occurrences if the client is recognized as a robot. This detection can be based either on the user agent request header or on a client IP address.
session.skipForUserAgent.0=googlebot session.skipForUserAgent.1=spider session.skipForRemoteAddr.0=10.0.* session.skipForRemoteAddr.1=127.0.0.1
If the client address cannot be determined from the connection (because of proxies, load balancers or SSL boxes, for instance), it can be read from the HTTP request header using
request.remoteAddrHeader=X-Forwarded-For
If a client is recognized as a robot via session.skipForUserAgent
or session.skipForRemoteAddr
but sends requests with a session ID, it can be redirected via 301 Moved Permanently
to the session-less version of the request using
session.skipByRedirect = <true|false>
where the default setting is false
.
For easier maintenance, the list of user agents and addresses can also be kept in plain text files. The configuration service is configured to support the following files:
<IS.INSTANCE.SHARE>/system/config/cluster/robot-agents.txt
# comment googlebot spider
<IS.INSTANCE.SHARE>/system/config/cluster/robot-addresses.txt
# comment 10.0.* 127.0.0.1
The Web Adapter can be seen as a specialized HTTP caching proxy. Whenever possible, it stores application server responses in a cache in the local file system and tries to respond to subsequent requests for the same resource by reading the stored response from the cache instead of forwarding the request to an application server.
The page cache is not session-dependent, which means that all session use the same set of pages in the cache (unless the pages are personalized via a personalization group ID).
The page cache can be enabled or disabled individually for each sales channel storefront application. The page cache options are accessed via the channel management plug-ins for sales channels in Commerce Management. Commerce Management also provides the possibility to invalidate the entire page cache manually, and to submit pre-defined keywords in order to delete only pages from the page cache that are marked with this keyword. For details, see the user guide Managing Online Shops.
The general naming scheme for locally cached static responses is:
<pagecache.dir>/<site>/static/<version>/hh/hh/hhhhhhhhhhhhhhhhhhhhhhhhhh
The format for locally cached pipeline-generated, or dynamic, responses is:
<pagecache.dir>/<site>/pipeline/<version>/hh/hh/hhhhhhhhhhhhhhhhhhhhhhhhhh
<pagecache.dir>
is taken from the local webadapter.properties file and can be controlled by the user. The default entry is
#pagecache.dir=/<IS.INSTANCE.LOCAL>/webadapter/pagecache
If you want to supply a different path, change the setting as needed and remove the #
character to enable this property.
Note
<site>
is taken from the request URL, <version>
is a site attribute retrieved from the configuration servlet. It is a time stamp that increases whenever the page cache of a site is invalidated. After page cache invalidation, the Web Adapter will start to use the new directory and after a while there will be no more open handles in the old one, allowing the old directory to be removed completely.
The last two sub-directories and the filename itself are created from a hash value printed in hex format. The hash value is derived from a unique key string built from the following request data (some verbatim and some shortened to internal codes):
<siteStatus>#<method>#<SERVER_PORT_SECURE>#<Host>#<serviceType>#<PATH_INFO>[;pgid=<pgid>][?<QUERY_STRING>][#<postContent>]
Responses that depend on other request attributes, such as special headers or cookies, cannot be cached. The <postContent>
data is only appended if it does not exceed a configurable maximum size defined in the Misc
section of the global webadapter.properties file. The default value (in bytes) is
post.cachesize = 16000
Larger POST
requests are not cacheable.
The following table lists basic page cache control settings defined in the global webadapter.properties file (Page Cache Options section).
Property | Description |
---|---|
pagecache.static.enabled[.<site>]pagecache.pipeline.enabled[.<site>] | Enables (true) or disables (false) the static or, respectively, pipeline response cache. Default: true. |
pagecache.pageRegenerationMaxage pagecache.pageRegenerationPhase | Unix systems only: If a page is found in the cache, which has expired less than pagecache.pageRegenerationMaxage seconds ago, it is made valid for the next pagecache.pageRegenerationPhase seconds before the request is forwarded to the application server. |
pagecache.ignore.# | Defines a list of query attributes which are to be ignored in the page cache lookup, mainly intended to ignore unwanted x/y coordinates included with image button clicks. |
pagecache.allowAuthorization | Enables the Web adapter to read/write its page cache with the Authorization header. For security reasons, false is the default value. |
pagecache.monitor | Enables the request/hit accounting in .wastatistics. |
shm.key | Unix systems only: Settings to control the shared memory setup for multi-processed Apache HTTP servers. Specify the unique identifier, the shared memory segment size, the access lock file and the shared memory re-initialization behavior. |
The page cache invalidation is triggered by the Web Adapter Agent. The corresponding settings are stored in the Web Adapter Agent section of the global webadater.properties file.
The following properties are available:
Property | Description |
---|---|
webadapterAgent.pageCache.invalidationList.serverGroup | Specifies the server group intended to handle page cache invalidation requests. The default value should not be changed: BOS. |
webadapterAgent.pageCache.invalidationList.servlet | Specifies the servlet intended to handle page cache invalidation requests. The default value should not be changed: /Pagecache. |
webadapterAgent.pageCache.pageClearInterval | Specifies the time interval (in seconds) between requests to invalidate expired pages. Default: 60. |
webadapterAgent.pageCache.siteClearInterval | Specifies the time interval (in seconds) between requests to delete deprecated page cache directories. Default: 86400. |
webadapterAgent.pageCache.siteClearForbidden | Can specify a comma separated list of time intervals (hh:mm-hh:mm), in which the page cache clearing is forbidden. Commented by default (cache clearing possible). |
webadapterAgent.pageCache.index.processors | Specifies the page cache index processors to use globally. |
webadapterAgent.pageCache.index.<site>.processors | Can specify site-specific page cache index processors. |
webadapterAgent.pageCache.index.<site>.enabled | Can enable or disable (true|false) page cache indexing for a specific site. |
webadapterAgent.pageCache.expiredFiles.delete | Switches the deletion of outdated page cache files on (true, default) or off (false). |
webadapterAgent.pageCache.expiredFiles.deletionDelay | Specifies a time in seconds (default: 300) after which outdated page cache files can be deleted (to avoid Web Adapter access conflicts). |
webadapterAgent.pageCache.expiredFiles.deletionInterval | Specifies a time interval in seconds (default: 1800) after which outdated page cache files are searched and deleted. |
When the Web Adapter receives client requests containing a Pragma
or Cache Control
header, it will bypass its own cached pages, send the request to an application server, and respond and update its page cache with the new response that results from this request. This behavior can be disabled via the pagecache.ignoreGetCacheControlHeaders
and pagecache.ignorePostCacheControlHeaders
properties in the global webadapter.properties file. The default is
pagecache.ignoreGetCacheControlHeaders = true pagecache.ignorePostCacheControlHeaders = true
The default value is true for both properties which is the recommended value for production systems. This ensures that it is always the application itself which determines the caching behavior (via enabling or disabling the page cache at application level), and not the clients. In a development environment, it may be advisable to set the properties to false
. This way, developers can verify changes immediately by simply using the browser's refresh function, without having to disable the page cache.
The Web Adapter agent features a recursive web crawling functionality. It is intended to refill the Web Adapter's page cache after page cache deletions, typically after data replications. By default, the web crawler is started and stopped automatically based on defined schedules, or explicitly from System Management.
This means, if a crawler run is started explicitly from System Management while a scheduled crawler is already running, the previous process is stopped immediately, and the new crawler request is processed.
In custom projects, any pipeline can trigger the web crawler using the PrefetchPageCache pipelet from the core cartridge.
By default, the web crawler is enabled and sends the page requests to the IP address of the Web Adapter. Using the local webadapter.properties file, however, the general control settings can be changed.
Property | Description |
---|---|
webadapterAgent. crawl.disabled | true disables the Web crawling functionality for the local host. The default value is false. |
webadapterAgent. crawl.localaddress | Defines the IP address of the Web Adapter. The default value is 127.0.0.1. |
The web crawler, as part of the Web Adapter agent, retrieves its configuration via the configuration servlet. The configuration settings are stored in a global crawler_defaults.properties file and one or more site-specific crawler_<site_name>.properties files, each of them located in <IS.INSTANCE.SHARE>/system/config/cluster.
The following table lists the possible web crawler configuration properties.
Property | Description |
---|---|
webadapterAgent.crawl.starturl | Intended to be used in the crawler_<site_name>.properties, as it defines the URL where to start to crawl the Intershop Commerce Management pages. Multiple start URL are possible using the pattern webadapterAgent.crawl.starturl.1=http://... webadapterAgent.crawl.starturl.2=http://... |
webadapterAgent.crawl.maxduration | Specifies the maximum duration of a crawler run in milliseconds. When the specified time is reached, the crawler run is stopped. If the value is not set or set to -1, the maximum duration is not limited. The default value is -1. |
webadapterAgent.crawl.crawluntil | Specifies a time when running crawlers are stopped to give way to usual business traffic. The allowed format is HH:MM, e.g., 06:30. |
webadapterAgent.crawl.maxrate | Specifies the maximum number of requests per minute issued by the crawler. If the value is not set or set to -1, the requests are sent as fast as possible. The default value is -1. |
webadapterAgent.crawl.maxdepth | Specifies the number of links followed, starting from the start URL. With crawl depth 0, for example, only pages from the start URL will be fetched, with crawl depth 1, pages are fetched from the start URL and all links included in the start URL response, etc. If the value is not set or set to -1, the maximum crawling depth is not restricted. The default value is -1. |
webadapterAgent.crawl.threadcount | A crawling run can be distributed over multiple threads. Note that crawling is always anonymous, i.e., no state information is kept between single requests, and cookies are disabled. The default value is 5. |
webadapterAgent.crawl.denyurl | Specifies a URL pattern (regular expression) that prevents a link to be followed. Several patterns can be specified appending number suffixes to the property name, like webadapterAgent.crawl.denyurl.1= webadapterAgent.crawl.denyurl.2= |
webadapterAgent.crawl.allowurl | Specifies a URL pattern (regular expression) that must be matched for a link to be followed. Several patterns can be specified appending number suffixes to the property name, like webadapterAgent.crawl.allowurl.1= webadapterAgent.crawl.allowurl.2= |
webadapterAgent.crawl.denytext | Specifies a link text pattern (regular expression) that prevents a link to be followed. Several patterns can be specified appending number suffixes to the property name, like webadapterAgent.crawl.denytext.1= webadapterAgent.crawl.denytext.2= |
webadapterAgent.crawl.allowtext | Specifies a link text pattern (regular expression) that must be matched for a link to be followed. Several patterns can be specified appending number suffixes to the property name, like webadapterAgent.crawl.allowtext.1= webadapterAgent.crawl.allowtext.2= |
webadapterAgent.crawl.sockettimeout | Specifies the maximum time (in milliseconds) for waiting for socket reads. If the value is not set or set to -1, the timeout check is disabled. Note that in case of a timeout, only the concerned request is cancelled. The default value is 0. |
webadapterAgent.crawl.connectiontimeout | Specifies the maximum time (in milliseconds) for obtaining a connection. If the value is not set or set to -1, the timeout check is disabled. Note that in case of a timeout, only the concerned request is cancelled. The default value is 0. |
webadapterAgent.crawl.useragent | Specifies the user agent header to allow for the Web crawler to be detected as a Web robot. |
webadapterAgent.crawl.contenttypes | Specifies the types of the response content that will be parsed for further links. Note that content of the default types will be parsed in any case, even if no value is set. The default values is text/html application/xhtml+xml. |
webadapterAgent.crawl.linktagattributes | Specifies the tag attributes that are considered as links. Note that the default tag attributes are considered in any case, even if no value is set. The default value is A/href AREA/href LINK/href EMBED/src FRAME/src IFRAME/src INPUT/src IMG/src SCRIPT/src BODY/background. |
webadapterAgent.crawl.replacepattern | To be used in conjunction with .replacetemplate, can specify, for example, JavaScript link patterns (as regular expressions) that must be replaced to be followed. |
webadapterAgent.crawl.replacetemplate | To be used in conjunction with .replacepattern, specifies the URL template that replaces the given link pattern. For example, with webadapterAgent.crawl.replacepattern= [Jj]ava[Ss]cript:goSH\('(.+)'\) webadapterAgent.crawl.replacetemplate= http://www.example.com/shops/$1/, the link JavaScript:goSH('music') will be replaced with http://www.example.com/shops/music/. |
webadapterAgent.crawl.starttime | Can specify crawler start times. Four fields, separated by spaces, define minute, hour, day of week and day of month using a cron-like syntax (each field can contain multiple values or value ranges separated by commas, asterisks mean 'any', the first day of week is Sunday = 1.) For example, 0 23 start any day at 23:00; 0 23 * * start any day at 23:00; 0 1 2-6 start from Monday to Friday at 01:00; 0 6,22 * 1,5,15-17 start at the 1st, 5th, 15th, 16th and 17th of each month at 06:00 and 22:00 |
In an Intershop Commerce Management deployment with multiple application servers, you can assign a single application server to certain server groups in a default installation. The default server groups are WFS for web front requests or BOS for back end requests. You can also add server group names for exclusive web front application use or data replication processes as required.
To add a server group to the server group list, open the global Web Adapter configuration file <IS.INSTANCE.SHARE>/system/config/cluster/webadapter.properties and add the intended value to server.groups. For example, to set up a server group for Data Replication processes:
Edit the global property file and add Data Replication, making the configuration available to all web adapters.
server.groups = WFS, BOS, DataRep
The instance-specific Web Adapter configuration file <IS.INSTANCE.LOCAL>/webadapter/webadapter.properties can control the response status to be returned by the web server upon a wastatus
request. To this end, the property wastatus.defaultResponse
is used.
##################################################################### ## service control ## ## Configures the regular HTTP response status of the 'wastatus' ## handler. This could be useful to exclude a running WA instance ## from load-balancing (before planned HTTP server maintenance). ## Set: ## ## 200 - to report a 'up' state (default) ## 500 - to report a 'down' state (not distinguishable from real failure) ## 503 - to report a 'down' state (with a distinct 'maintenance' code) ## ## Requires an appropriate front-end load-balancer configuration ## to be recognized - and a reset to "200" to resume operation! #wastatus.defaultResponse = 200 #wastatus.defaultResponse = 500 #wastatus.defaultResponse = 503
If you use an external hardware unit to encrypt SSL communication, the port for decrypted https and the original SSL port must be propagated to the Web Adapter. In this case, edit the following lines in the <IS.INSTANCE.LOCAL>/webadapter/webadapter.properties file according to your needs:
sslbox.webserver.port=81 sslbox.public.port=443
For more information on SSL box support, see Guide - Web Server Settings.
The Web Adapter features two load-balancing algorithms for distributing HTTP requests between multiple application servers, a response time-based algorithm and another one that is based on a configurable server process weighting. These algorithms try to find the best server for each request. They distinguish between session-bound .enfinity/.servlet
requests and requests that do not rely on session affinity, such as requests resolved by the .static
handler.
Since there is no server load information available to the Web Adapter, it measures the request response time of each application server and uses this data as the quality indicator of an application server. The load-balancing achieves the best performance of a cluster by allocating more load to machines whose response times are shorter. All requests made as part of one session are routed to the same application server regardless of response time. This is known as session affinity and always applies, except in fail-over situations. Session affinity improves performance because there is a better chance that session-related data will still be in the server's cache, ensuring faster response times than those possible on other servers, which would need to load this data from the database first.
The behavior of the load-balancing algorithm can be controlled by four properties, all stored in the global webadapter.properties file.
Note
Property | Description |
---|---|
lb.filterperiod | The default setting for session-bound requests is: session.lb.filterperiod = 120 and for sessionless requests: request.lb.filterperiod = 30 As a rule of thumb, a small |
lb.qualityweight | The default setting for session-bound requests is: session.lb.qualityweight = 1 and for sessionless requests: request.lb.qualityweight = 1
|
lb.initialTimeFactor | Each newly registered application server starts with a processing time measure determined by the filtered response time of the application server that is currently fastest, multiplied with the value of the property lb.initialTimeFactor. A larger property value means that the request load for this application server increases slower after server startup to avoid server overload due to initially short response times. If the value is too large however, the application server might stay idle for a longer period of time. The default value is 5.0. |
lb.connectPenaltyFactor | If an application server request fails temporarily due to a connect error or timeout, the application server is charged with a penalty time to decrease its probability to get new sessions or requests. The lb.connectPenaltyFactor controls the penalty time. The default value that is suitable for most situations is 1.5. Larger values lead to a larger adjustment of the application server probability, so that multiple connect problems have a higher impact on the number of sessions and requests that will get assigned to this application server. They may be used in deployments with many application server threads where a connect error indicates serious or rare overload situations. However, in high-load situations larger values can also cause an unstable, uneven load distribution due to over-reaction to single connect problems. |
Change the default settings only if necessary, as they are sensitive. An indication for load balancing problems might be a very uneven and unstable workload distribution among the application servers over a longer period of time. Keep in mind that there might be a number of other issues that lead to uneven application server load, for example single sessions that introduce an exceptional high load on the system, or Commerce Management user activities like data updates as well as Intershop Commerce Management jobs running on certain application servers.
With this mechanism, each application server process is configured with a weight value. The value is supposed to indicate this server's theoretical request processing capacity in relation to the other servers. The server usage probabilities - as derived from the average response time measure - are corrected with the relative weight of the server within all available servers.
Assume one 4-core and one 8-core server with equal CPU clock-frequencies and response times. Normally, they share the load 50:50, maybe leading to a 80% host utilization of the smaller server and a 40% utilization of the larger one. To have a larger share of the load in the larger server, we have to "overload" the smaller one: Response times will increase, the usage probability will decrease. With the new weight setting, this can be done without reaching the "beginning overload" state, as illustrated below:
Server | Average response time | Original probability | Weight | Weighted probability |
---|---|---|---|---|
4-core | 100 ms | 0.5000 | 40 | 0.3333 |
8-core | 100 ms | 0.5000 | 80 | 0.6667 |
Each application server process (not host) - uniquely known to the Web Adapter by <IP>:<port> - is assigned exactly one weight
value. The built-in default weight is 1. Custom values are specified using the lb.serverWeight*
settings in the webadapter.properties file. These settings can be either global or specific per netmask, host or server process, using
lb.serverWeight = <weight> lb.serverWeight.<CIDR> = <weight> lb.serverWeight.<IP> = <weight> lb.serverWeight.<IP>.<port> = <weight>
where <CIDR>
is the netmask in CIDR notation, <IP>
is the IPv4 network address in decimal dotted notation, <port>
is the TCP port number, and <weight>
is the integral weight value (0..99999). The last matching setting - the most specific one - wins.
Settings in the local <IS.INSTANCE.LOCAL>/webadapter/webadapter.properties file will win over <IS.INSTANCE.SHARE>/.../webadapter.properties only if they provide an equal or better match. A local lb.serverWeight = 10
will not override a shared lb.serverWeight.10.0.0.1 = 20
.
When using this load-balancing algorithm, keep the following issues in mind:
.wastatistics
dump. If the feature is in use, there is a new "weight" column in the per-server statistics. Also, the successfully parsed lb.serverWeight* settings are reported. The "( <n> matches)" output indicates how many servers got their actual weight setting from this particular rule. This may help in complex configurations like determining, for example, if a new rule exactly selects the 16 servers that were intended.webadapter.properties settings:
lb.serverWeight = 40 lb.serverWeight.10.0.29.64/26 = 80
/is-bin/INTERSHOP.wastatistics
output:
single: server ... ms_f weight WFS BOS TEST 0 10.0.29.19:10099 1 40 0.1667 0.3333 - 1 10.0.29.20:10099 1 40 0.1667 - - 2 10.0.29.64:10099 1 80 0.3333 0.6667 1.0000 3 10.0.29.65:10099 1 80 0.3333 - - ... properties: ... lb.serverWeight = 40 ( 2 matches) lb.serverWeight.10.0.29.64/26 = 80 ( 2 matches)
This section lists and explains the most common performance-relevant properties of webadapter.properties as well as configuration options for shortening the response time of the Web Adapter.
Note
The values and recommendations given in this section refer to standardized systems and may differ in actual systems. Therefore, all settings and their effects must be tested extensively.
This property determines the maximum content-length of HTTP POST requests. It becomes important if one wants to restrict HTTP POST request size. Large POST requests can put a large amount of load on the application servers and can slow down their response time. Also, by using the parameter, one can avoid denial of service attacks through very large HTTP POST requests.
Per default, the value of this property is set to 0
, which means there is no limit for HTTP POST requests. When you set a limit and a HTTP POST request exceeds the limit, the response will be 413 (Request Entity too large). Please keep in mind, however, that the value for post.maxsize
must be large enough to ensure the smooth communication between Webadapter and application servers.
Warning
Setting a low value can prevent processing of valid requests (large forms). Monitor the application carefully after adjusting the value.
Warning
When not set, any unfriendly client can request large amounts of memory up to a complete "denial of service".
This property determines the maximum content length of POST requests, which are cacheable by the WebAdapter.
Per default, the value of this property is set to 16000
. See also Page Cache Settings.
If the property as.connect.keepalive
is set to true
, one persistent connection is used for all AS requests triggered by a single client request (the forwarded client request and all its <wainclude>s). Try both variants, true
and false
, in real conditions; false
was sometimes observed to be faster.
Per default, this property is set to false
.
Enable this property to search and process business event tags (<cicevent .../>) in the response content. For performance reasons, do not enable it when not needed.
Per default, this property is set to false
.
When stopping an Intershop application server, the Web Adapter requires a certain time until it gets aware of the application server's status. Until that moment, the Web Adapter may redirect customer requests to an application server that is not running anymore. Customers whose request is sent to an already stopped application server receive an error message after some time, which is not desirable. The following properties can be adjusted to shorten the reaction time of the Web Adapter in case an application server stops.
The values of the properties as.connect.timeout
and as.socket.timeout
can be adjusted to set socket or communication timeouts. Use a short timeout for the connection attempt since there is a retry/failover policy. Use a long timeout for read/write operations to allow an AS to process expensive requests without running into timeouts.
Per default, the value of as.connect.timeout
is set to 10
and the value of as.socket.timeout
is set to 300.
Two other relevant properties for shortening the reaction time of the Web Adapter are intershop.registration.registrationTime
and intershop.registration.expirationTime
.
Note
The application servers send an registration event every 10 seconds that informs the cluster that they are still alive. You can shorten the interval by changing the value for intershop.registration.registrationTime
.
If appserver1 is shut down, the configuration service does not receive this event and declares appserver1 as "dead" after a certain time. You can change this timeout by adjusting the value of intershop.registration.expirationTime
.
For more information see appserver.properties.
Info
If you want to adjust these properties to shorten the reaction time of the Web Adapter in case an application server stops, ensure that the following criteria are met:
This property sets the minimal time between subsequent configuration requests of the Web Adapter and Web Adapter agent. You may want to increase this value if there are many Web Adapter instances to reduce the configuration service load.
Per default, the value of this property is set to 10
.
Note
A small lb.filterperiod
makes the Web Adapter react more quickly to server performance (load) changes but also makes it more sensitive to disturbance. Per default, the value of this property is set to 120
.
Note
Default settings for these parameters should not be changed without extensive testing.
See also lb.filterperiod.
Use this property to toggle the request/hit accounting in .wastatistics. This can improve throughput in special, extreme load conditions.
Per default, this property is set to true
.
All configuration services in the cluster must be listed here in form of an HTTP URL (the port number is required). In case one application server goes down, the Web Adapter can use another configuration service as fallback. Requests can be processed as long as at least one configuration service is available.