ADR 016: Backup Strategy¶
Status: Accepted Type: Feature Created: 2024-07-17 Related-ADRs: 010, 024, 036
Context and Goals¶
Ensuring the availability and integrity of data is critical for the Hop3 platform. A robust backup strategy is essential to protect against data loss, corruption, and ensure quick recovery in case of failures. The goal is to define a comprehensive backup strategy that covers different types of data (e.g., configuration files, application data, and databases) and ensures that backups are performed regularly, stored securely, and can be restored efficiently.
This ADR defines the long-term vision for Hop3's backup capabilities. ADR 024 specifies the foundational backup and restore system on which the later phases build.
Decision¶
Hop3 implements a comprehensive backup strategy that includes regular backups of critical data, secure storage of backup files, and efficient restoration procedures. This strategy encompasses application data, configuration files, and databases.
The strategy is delivered in phases, so that a usable foundation exists before the more operationally demanding capabilities (scheduling, remote storage, encryption, incremental backups) are layered on:
| Feature | Phase | ADR |
|---|---|---|
| Manual full backups | Phase 1 | ADR 024 |
| Local storage | Phase 1 | ADR 024 |
| Checksum verification | Phase 1 | ADR 024 |
| Service-specific backups | Phase 1 | ADR 024 |
| Automated scheduled backups | Phase 2 | - |
| Retention policies | Phase 2 | - |
| Remote storage (S3, B2) | Phase 3 | - |
| Encryption | Phase 3 | - |
| Incremental backups | Phase 3 | - |
| Transaction log backups | Phase 3 | - |
Key Components¶
Backup Types and Frequency¶
Phase 1 (specified in ADR 024): - Manual full backups on demand - All application components in one backup
Phase 2+:
- Configuration Files:
- Frequency: Daily backups of configuration files such as
hop3.tomland other relevant configurations. -
Retention: Retain daily backups for 30 days and monthly backups for 12 months.
-
Application Data:
- Frequency: Incremental backups daily and full backups weekly for application data.
-
Retention: Retain daily incremental backups for 30 days and weekly full backups for 6 months.
-
Databases:
- Frequency: Daily backups of databases with transaction log backups every hour.
- Retention: Retain daily backups for 30 days and monthly backups for 12 months.
Backup Storage and Security¶
Phase 1 (specified in ADR 024): - Local file-based storage only - File permissions (600) for access control - SHA256 checksums for integrity
Phase 2+:
- Storage Locations:
- Local Storage: Store backups locally on a dedicated backup server or storage device.
-
Remote Storage: Use remote storage solutions such as cloud storage providers (e.g., AWS S3, Google Cloud Storage, Backblaze B2) for redundancy and disaster recovery.
-
Security Measures:
- Encryption: Encrypt all backup files at rest and in transit to ensure data confidentiality (using Age or GPG).
- Access Control: Implement strict access control measures to restrict access to backup files to authorized personnel only.
Restoration Procedures¶
Phase 1 (specified in ADR 024):
- Manual restore via CLI (hop3 backup restore)
- Checksum verification before restore
- Service-specific restore (PostgreSQL via pg_restore, etc.)
Phase 2+:
- Regular Testing:
- Test Restorations: Perform regular test restorations to ensure that backup files are not corrupted and can be restored successfully.
-
Documentation: Maintain detailed documentation of the restoration procedures and update it regularly.
-
Automated Restoration:
- Automation Tools: Use automated tools and scripts to facilitate quick and efficient restoration of backups.
- Monitoring: Implement monitoring systems to detect and alert on backup failures or issues.
Continuous Improvement¶
- Feedback Loop:
- User Feedback: Establish a feedback loop with users and administrators to continuously improve the backup strategy based on real-world usage and feedback.
-
Performance Monitoring: Monitor the performance and reliability of the backup processes to identify and address any issues promptly.
-
Community Engagement:
- Hop3 Community: Encourage contributions from the Hop3 community to refine and enhance the backup strategy.
Consequences¶
Benefits¶
- Data Protection: Ensures the availability and integrity of critical data.
- Quick Recovery: Facilitates quick recovery in case of data loss or corruption.
- Security: Enhances security through encryption and strict access control measures (Phase 3).
Drawbacks¶
- Resource Intensive: Requires significant storage resources and network bandwidth for regular backups.
- Management Complexity: Adds complexity to system management, requiring careful planning and monitoring.
- Phased Delivery: The advanced capabilities (scheduling, remote storage, encryption, incremental backups) depend on operational machinery that the foundational phase does not provide, so they become available only as later phases are built.
Risks¶
- Backup Failures: Potential risk of backup failures or corruption. Mitigation involves regular testing and monitoring.
- Security Breaches: Risk of unauthorized access to backup files. Mitigation includes strong encryption (Phase 3) and access control measures.
References¶
- Implementation: ADR 024: Backup and Restore System
- Code:
packages/hop3-server/src/hop3/core/backup.py - User Docs:
docs/src/backup-restore.md
Related ADRs: ADR 010: Security and Resilience (Umbrella), ADR 024: Backup and Restore System, ADR 036: CLI Ergonomics and Command Surface