Issue
- During an update we observed an error on startup and so decided to rollback from 5.X to a 4.X version. However, when we tried to restart services the system would not recover, hanging indefinitely. Further, the backup list became inaccessible. Eventually the matter resolved itself, but this was concerning and we want to know how it can be prevented?
Environment
- 2024 Quarterly Releases
Resolution
- Cloud stack service dependencies are defined in their respective LCP.json and ensure that services will start in the below sequence:
- database and search
- liferay
- backup and webserver
- These as well as health checks result in a self healing mechanism and hence shouldn’t be a cause of concern. You can check the kubernetes events in the service status logs for more details during deployment: https://learn.liferay.com/w/liferay-cloud/support-and-troubleshooting/troubleshooting/reading-liferay-cloud-service-logs#log-types
- It is also not recommended to update each service as there are dependencies between them in a particular release. It is recommended to group all service version changes as per the corresponding service changelogs here: https://help.liferay.com/hc/en-us/articles/29035668172429-08-01-2024-Service-Release-Updates
- Now in the case of a rollback, revert that particular build itself consisting of the group mentioned above.
- Please note too that we do not suggest to rollback to a 4.x version as a long term option, it should be used only to stabilize the current environment during an incident. It is recommended that both the cloud platform and DXP updates are tested in non-production with code and data similar to production as much as possible.
- Lastly, if you ever do need to rollback again let the engineer on the ticket know that the search volumes may need to be deleted as the environment can sometimes complain of the newer version of files when you start the older ES version.