Concept - CaaS DevOps - Monitoring the Progressive Web App

1 Introduction

This document describes which aspects of a system are monitored by Intershop and which need to be taken care of by an implementation partner. It also describes the procedure to follow in case of unusual incidents and system failures.

1.1 Glossary

TermDescription
AKSAzure Kubernetes Service
INTIntegration environment
non-PRDNon-production environment (INT and UAT)
PRDProduction environment
PWAIntershop Progressive Web App
SoWStatement of work
UATUser acceptance test environment

1.2 References

2 Performance Indicators

The following aspects are monitored:

  • Resource usage (memory, CPU)
  • Pod status (restarts, readiness, liveness)
  • Deployment status (success)
  • Kube events (system warnings related to applications - i.e. failed liveness probes) 

The availability of production systems is monitored by using a central Nagios-based solution. As an advanced solution, we set up Azure Application Insight for PWA projects, if required.

3 Tools

Progressive Web App projects are hosted in Azure Kubernetes Service (AKS) instances. Typically, the standard monitoring tools and capabilities provided by Azure are used.

The customer usually does not have access to the monitoring tools. However, we provide read-access for the AKS cluster and the corresponding customer resources. Thus, the customer can use the K8s standard command line tools (kubectl) to retrieve the status of the deployments.

3.1 Azure Container Insights

Azure Container Insights is the Azure solution for monitoring performance and health status of a Kubernetes cluster itself and all workloads deployed into the cluster. 

3.1.1 Overview Page

The overview page is the entry point into Container Insights. It shows metrics on resource utilization and provides tabs/links to further aspects.

3.1.2 Health State

These views show the health of a single application (i.e. a deployed PWA environment) inside the cluster. The screenshots show the status of deployments of related pods and warning messages from Kubernetes related to the application namespace.  

3.2 Azure Application Insights

Azure Application Insights are used for availability tests of a PWA application. This is an easy option for monitoring availability and response times for a PWA in order to set up alerts.

For further information refer to Application Insights availability tests in the Microsoft documentation.

4 Monitoring of Different Environments

PRD environments are hosted on other AKS instances than INT and UAT. However, the tooling setup is the same.

The only difference between PRD and non-PRD is the setup of alerts and the process of handling alerts. Specifically, this means:

  • The availability of PRD systems is monitored by Intershop Operations.
    In the event of a system failure, defined processes are prioritized higher for a PRD system. If manual tasks are required, the response time to PRD is faster.
  • In the best case, manual restarts can be prevented by sufficient liveness probes and resource limits provided by the application.
    A pod is automatically restarted if its liveness probe fails or resource limits are reached. In both cases, appropriate kube events would be triggered and the problem can then be analyzed.
  • Since it is always possible that a pod will be restarted/scheduled by the K8s system, the replica count for deployments on PRD is always >1. On UAT, the replica count is set to 1 by default.
    That is, if a single PWA pod is unavailable, the UAT environment would be unavailable until a new pod is operational. On PRD, at least 3 pods are usually started for PWA environments.

5 Handling of Incidents

In case of incidents, the implementation partner must be informed.
The implementation partner then either checks and fixes the problem or escalates to Intershop.
General information on incident management is documented in the customer's SoW, chapter 2.

6 Handling of Alerts

Currently, Intershop does not offer automatic notifications or alerts for live shop outages. In case of outages, we inform the customer via service desk ticket.

With the upcoming implementation of a monitoring system, we will provide self-service SLA dashboards for customers.

Disclaimer

The information provided in the Knowledge Base may not be applicable to all systems and situations. Intershop Communications will not be liable to any party for any direct or indirect damages resulting from the use of the Customer Support section of the Intershop Corporate Web site, including, without limitation, any lost profits, business interruption, loss of programs or other data on your information handling system.

Customer Support
Knowledge Base
Product Resources
Tickets