Getting timeout exception during application restart in a cluster environment

Issue

Getting the below error (TimeoutException) during application restart in a cluster environment:

Error: ERROR [localhost-startStop-1][ClusterSchedulerEngine:604] Unable to load memory 
clustered jobs from master in 10 seconds, you might need to increase value set to 
"clusterable.advice.call.master.timeout", will retry again
java.util.concurrent.TimeoutException

Environment

Liferay Portal 6.2

Resolution

The issue might be due to network problems or, most likely, very high CPU usage.
Might be one of the servers is exhausted and not answering in time for the communication between nodes.
It could be due to a node being unable to reach the master node when asking for the memory-clustered jobs.
Typically, in a clustered environment, in order for the node to be able to run the scheduled jobs in the event that the master node goes down, it needs to have the scheduled jobs in its memory beforehand. That way, if the master node went down unexpectedly, then the second node could become the new master node and pick up where it left off.

As a result, the above behavior might be possible due to some sort of connection issue between the nodes that prevents the request to retrieve the scheduled jobs from going through. If this is the case, the infrastructure or network team must be contacted.

Additional Information

How-to-diagnose-and-recover-liferay-cluster