Service Probes should be configured based on your individual business needs. The following overview is meant to explain the basics of how Service Probes work, so that you can better make decisions about configuring them in your own environment.
For more information on using and configuring probes, see our Self-Healing documentation.
Overview
Service probes are Kubernetes Probes that perform the following functions:
- Ping your service from a private IP.
- Make a request directly to your instance.
- Determine whether your instance passes or fails based on the probe configuration.
General Configurations
- The following settings can be configured for each probe in each service's LCP.json:
- Use
httpGet
- with a path to ping
- Or (less commonly), an
exec
- with a script to execute
- Set a threshold for:
- delay
- timeout
- number of failed attempts
- number of successful attempts.
- Use
- See Configuration via LCP.json for examples.
Liveness Probe
- The Liveness Probe pings the path configured in the LCP.json to determine whether the service should be restarted.
Default Configurations
- Uses
httpGet
. The default paths are as follows:-
Backup:
/
- status OK if you can ping the root path -
CI:
/login
- status OK if you can get a 200 response from the login page -
Database:
/instance/healthy
-
Liferay:
/c/portal/layout
-
Search:
/_cluster/health
-
Webserver:
/nginx_status
-
Backup:
Explanation of Defaults
- The Liveness probe follows redirects from its default path.
- Because of this, when the liveness probe pings
/c/portal/layout
in the Liferay service, rather than reading a 300 response and verifying liveness, it will follow the/c/portal/layout
to the landing page (e.g./web/guest/home
), and it will try to read a 200 response from that page. - With that in mind, be aware of the loading times for your landing page when configuring the probe. If your homepage takes longer to load than the liveness probe timeout, the liveness will fail.
- Because of this, when the liveness probe pings
- In the Search service,
/_cluster/health
almost always gives a 200 response, regardless of cluster status (green, yellow, or red).- This can lead to the search service passing the liveness probe when the server is down.
- The default paths for Backup, Database, and Webserver services are custom endpoints.
Configuration considerations for your team
- Define what 'Liveness' means.
- Configure the probe based on your team's minimum requirement for considering a service to be alive.
- When configuring the Liveness Probe, consider what kinds of failures should result in a restart.
- Some failures will not be resolved by a restart. For example, restarting during a startup failures may result in a startup loop, so the Liveness Probe should not be configured to trigger a restart during startup.
- Other types of failures may be resolved by a restart, such as a 3rd party service going down or settings needing to be changed by restart.
- When the service need to be restarted should be defined by your project's tolerance levels.
- When configuring the Liveness Probe, consider what kinds of failures should result in a restart.
- Configure the probe based on your team's minimum requirement for considering a service to be alive.
- Your project's tolerance levels may differ between environments.
- For example, more downtime may be tolerable in DEV than in PROD.
- You may need to develop custom endpoints that satisfy the conditions of whether your services are alive.
- Custom development will ensure that the probes are consistent with your project's tolerance levels.
Readiness Probe
- The Readiness Probe checks whether a node can receive network traffic.
Default Configurations
-
- Uses
httpGet
. The default paths are as follows:-
Backup:
/
-
CI:
/login
-
Database:
/instance/healthy
-
Liferay:
/c/portal/layout
-
Search:
/
-
Webserver:
/nginx_status
-
Backup:
-
publishNotReadyAddressesForCluster
- Liferay = false
- Search = true
- Uses
Explanation of Configurations
- The
publishNotReadyAddressesForCluster
property adds the IP to the cluster route before the node is ready to receive traffic.- The default cluster endpoint for every service is
[service]--cluster
- This configuration is set to
true
for the Search service because search nodes must come up and receive traffic before they are ready. - It's set to
false
for the Liferay service, because liferay must be fully ready before it can receive traffic.
- The default cluster endpoint for every service is