Filecoin Project
Backgroundβ
The Infrastructure Team is responsible for running several critical infrastructure components of Filecoin Mainnet and Calibnet for Protocol Labs.
These services are network-critical and require high availability and responsive monitoring to ensure seamless performance. βοΈπ
List of Servicesβ
- π Filecoin Calibnet Faucet
- π Mainnet and Calibnet Full Bootnodes
- βοΈ Mainnet and Calibnet Snapshot Services
- ποΈ Mainnet and Calibnet Archival Nodes
- π Forest Snapshot Backfilling
Filecoin Infrastructure Inventoryβ
Access the complete infrastructure inventory here. π
Deployment & Upgrade Stepsβ
We utilize Ansible to deploy services in a containerized environment. Whether itβs an initial deployment or an upgrade, the process remains consistent.
π Deployment Type: Recreate
-
Get the Latest Image Tag
-
Update the Image Tag
Update the respective host configurations in:- filecoin-execution (primary repository)
- infra-ansible (legacy repository - see note below)
- fil-ansible-collection (legacy repository - see note below)
- forest-team-execution
β οΈ Important Note on Repository Structure:
The Filecoin project is currently deployed from multiple separate repositories due to historical infrastructure drift:
infrastructure-general/ansible/filecoin-execution- Primary, consolidated repositoryinfra-ansible- Legacy repositoryfil-ansible-collection- Legacy repositoryCurrent Status: All repositories are actively used for Filecoin deployments. Both legacy repositories route alerts to the same PagerDuty integration (
pd-fil-infra-incidents-high) as the primary repository, ensuring consistent alerting coverage.Future Plans: There are plans to reintegrate the legacy repositories back into the primary
infrastructure-generalrepository to establish a single source of truth and eliminate infrastructure drift. Until that migration is complete, all repositories must be maintained.
-
Dry Run the Ansible Command
Use--diff --checkflags to preview the changes before applying them. π οΈ -
Apply the Changes
Re-run the actual command to deploy the changes. π -
Verify Deployment
Ensure the service is up and running by performing post-deployment checks. β -
Raise a PR
Submit a Pull Request and request team approval. π
Monitoring & Alertingβ
Alert Sourcesβ
Filecoin infrastructure is monitored through two alerting systems:
-
Prometheus/Alertmanager Alerts (self-hosted)
- Node health, sync status, peer connectivity
- Host metrics (CPU, memory, disk)
- Routes via
project_name: "filecoin"label to PagerDuty
-
Grafana Cloud Alerts (managed via Terraform)
- Filecoin snapshot service monitoring
- Configured in:
infrastructure-general/terraform/grafana-cloud/filecoin.tf - Active alerts:
FilecoinSnapshotAgeOld- Snapshot older than 120 minutesFilecoinOrphanArchiveFile- Orphan files in snapshot archiveFilecoinSnapshotNoUpload- No snapshots uploaded in 2 hours
- Routes to same PagerDuty integration (
pd-fil-infra-incidents-high)
Both alerting systems route to the same PagerDuty integration, ensuring consistent on-call coverage.
Runbook for Troubleshootingβ
We actively track actionable alerts, each accompanied by detailed steps for resolution. π¨
π Check out the Filecoin Runbook here.