Lodestar Run-book
This document outlines procedures and steps to address common alerts and issues.
individual_validator_losing_balance
Description: Validator balance is unexpectedly decreasing.
Action:
- Ensure Execution Layer and Consensus Layer attached to the Validator Client are properly synced.
- Restart them if they are not synced.
- Monitor the alert; if it does not auto-resolve, consider replacing the Beacon Node (BN) machine.
LowExitMeassagesLeft
Description: The number of pre-signed exit messages in the current validator_ejector_exit_messages_subdirectory of the validator-ejector instance with label validator_ejector_node_size="small" is zero.
Action:
- Please increase the value of the validator_ejector_exit_messages_subdirectory variable by one.
- Run
make start-validator-ejector HOSTS=aws-lido-prod-ejector-smallto load the exit messages in the new diretory.
NoExitMessagesLeft
Description: The number of pre-signed exit messages on the of the validator-ejector instance with label validator_ejector_node_size="large" is zero.
Action:
- Add new pre-signed exit messages to both validator ejector instances (Small and large) to ensure smooth operation.
- Run
make start-validator-ejector HOSTS=lido_prod_ejectorto load the new exit messages.
missed_attestations_in_mass
Description: A significant number of attestations are being missed by validators attached to the Beacon node.
Action:
- Usually should auto-resolve in about 10mins. if it does not auto-resolve in 10mins, proceed to the next step.
- Redirect the Validator Client (VC) to a backup Beacon Node (BN).
- Investigate and resolve issues with the primary BN before reverting the VC.
StuckBeaconNode
Description: A Beacon Node (BN) is unresponsive or stuck syncing.
Action:
- Redirect Validator Clients (VCs) to a backup BN.
- Restart the primary BN's Consensus Layer (CL) container.
ValidatorMissedBlock
Description: A validator failed to propose a scheduled block.
Action:
- Verify all related services are operational.
- Collect and share logs from Execution Layer (EL), Consensus Layer (CL), and Validator Client (VC) with the development team for investigation.
BeaconNodeMemoryLeakDetected
Description: A memory leak has been detected in the Beacon Node process.
Action:
- Monitor the situation closely.
- Restart the Beacon Node process to mitigate immediate memory concerns.
- Inform the Lodestar development team of the issue for further investigation.