Skip to content

ADR 024: Backup and Restore System

Status: Final Type: Feature Created: 2025-11-08 Related-ADRs: 016, 020

Relationship to ADR 016

This ADR specifies the foundational implementation of Hop3's backup system. ADR 016 defines the long-term backup strategy, including features that build on this foundation (automated scheduling, remote storage, encryption, incremental backups). This ADR focuses on the file-based core that enables those enhancements.

Context

Hop3 needs a comprehensive backup and restore system to protect user applications and data. This is essential for:

  1. Disaster Recovery: Quickly recover from server failures, data corruption, or accidental deletions
  2. Deployment Safety: Allow rollback to previous versions if deployments fail
  3. Application Cloning: Enable creating staging/test environments from production
  4. Migration: Facilitate moving applications between servers
  5. User Confidence: Give users peace of mind that their data is protected

The backup system must be: - Complete: Capture all necessary data (code, data, config, services) - Reliable: Ensure data integrity with verification - Simple: Easy to use via CLI commands - Efficient: Minimize storage use and backup time - Extensible: Support future enhancements (encryption, remote storage, etc.)

Decision

Hop3 uses a file-based backup system with the following design:

Backup Format

Each backup is stored as a directory containing:

/home/hop3/backups/apps/<app-name>/<backup-id>/
├── metadata.json         # Backup manifest with checksums
├── source.tar.gz        # Source tree (src/) + bare git repo (git/)
├── data.tar.gz          # Application data archive
├── env.json             # Environment variables (JSON)
└── addons/              # Per-addon backups (e.g. postgres dumps)
    └── postgres_<name>.sql

Path is HopConfig.BACKUP_ROOT (defaults to HOP3_ROOT/backups). source.tar.gz archives both the deployed working copy (src/) and the bare git repo (git/) so backups remain meaningful for both deploy paths Hop3 supports — git-push (populates the bare repo) and the JSON-RPC tarball API (writes directly to src/).

Key Design Choices

  1. Directory-Based Storage
  2. Each backup is a self-contained directory
  3. Easy to inspect, verify, and manage manually if needed
  4. Simplifies integrity checking (each file has independent checksum)
  5. Alternative considered: Single archive file (rejected - harder to inspect/verify)

  6. Tar.gz Compression

  7. Standard, well-supported format
  8. Good compression ratio (typically 50-80%)
  9. Fast compression/decompression
  10. Can stream large files without loading into memory
  11. Alternative considered: zip (rejected - less efficient), xz (rejected - slower)

  12. JSON Metadata

  13. Human-readable and inspectable
  14. Standard format with excellent tooling
  15. Easy to parse and validate
  16. Contains complete inventory with checksums
  17. Alternative considered: Binary format (rejected - not human-readable)

  18. SHA256 Checksums

  19. Industry-standard cryptographic hash
  20. Detects any file corruption or tampering
  21. Fast to compute
  22. Stored in metadata.json for each file
  23. Alternative considered: MD5 (rejected - cryptographically broken), SHA512 (rejected - overkill)

  24. Service Plugin Integration

  25. Leverages existing Addon protocol
  26. Each service implements backup() and restore() methods
  27. Service-specific backup format (e.g., PostgreSQL uses pg_dump)
  28. Extensible: new services automatically support backup
  29. Alternative considered: Generic service backup (rejected - loses service-specific optimizations)

  30. Unique Backup IDs

  31. Format: YYYYMMDD_HHMMSS_<random-6-chars>
  32. Sortable by creation time
  33. Collision-resistant (random suffix)
  34. Human-readable timestamp
  35. Alternative considered: UUID (rejected - not human-friendly), sequential numbers (rejected - not globally unique)

Metadata Schema

The metadata.json includes:

{
  "backup_id": "20251108_143022_a8f3d9",
  "app_name": "my-app",
  "created_at": "2025-11-08T14:30:22Z",
  "format_version": "1.0",
  "hop3_version": "0.8.0",
  "size_bytes": 15728640,
  "checksums": {
    "source.tar.gz": "sha256:abc123...",
    "data.tar.gz": "sha256:def456...",
    "env.json": "sha256:ghi789..."
  },
  "app_metadata": {
    "hostname": "myapp.example.com",
    "port": 8000,
    "run_state": "RUNNING"
  },
  "addons": [
    {
      "type": "postgres",
      "name": "my-database",
      "backup_file": "addons/postgres_my-database.sql",
      "size_bytes": 5242880,
      "checksum": "sha256:jkl012..."
    }
  ],
  "env_vars_count": 12,
  "expires_after": 0
}

Database Integration

Backups are tracked in the database via the existing Backup model:

class Backup(BigIntAuditBase):
    app_id: int
    state: BackupStateEnum  # SCHEDULED/STARTED/COMPLETED/FAILED
    remote_path: str        # Path to backup directory
    size: int              # Total size in bytes
    expires_after: int     # Retention time (0 = never)

This provides: - State tracking for backup operations - Integration with Hop3's audit trail - Future support for scheduled backups - Retention policy enforcement (future)

Restore Behaviour

hop3 backup restore <id> repopulates source / data / env / addons and invokes the build+spawn pipeline at the end. After the command returns, the app is running again — equivalent to its pre-backup state. This matters for cross-instance restore on a fresh host, where there is no prior build state to reuse.

Pass --target-app <new-name> to restore as a clone alongside the original, instead of in-place.

Cross-Instance Migration

Backups are portable across Hop3 instances. The operator workflow:

  1. On A: hop3 backup create <app> produces a directory under BACKUP_ROOT/apps/<app>/<id>/.
  2. Transport: copy that directory to instance B (e.g. scp -r).
  3. On B: hop3 backup register <path> reads the manifest, ensures an app row exists for the original app name, and inserts a Backup row pointing at the directory — making it findable by restore.
  4. On B: hop3 backup restore <id> (or ... --target-app NAME to restore under a different name).

backup register is idempotent and verifies the manifest checksums before registering — a corrupted backup is rejected with a clear error rather than letting restore fail later with a less actionable message. Without registration, the destination's restore_backup DB lookup misses the transferred files entirely.

Consequences

Positive

  1. Simple and Transparent
  2. Users can inspect backups with standard tools
  3. Easy to debug issues
  4. No proprietary formats

  5. Reliable

  6. SHA256 checksums ensure integrity
  7. Atomic operations prevent partial backups
  8. Verification before restore

  9. Complete

  10. Captures all application components
  11. Includes service data
  12. Preserves environment variables

  13. Extensible

  14. Easy to add new backup targets
  15. Service plugins handle service-specific logic
  16. Metadata format supports versioning

  17. Efficient

  18. Compression reduces storage
  19. Streaming for large files
  20. No unnecessary copies

Negative

  1. Local Storage Only
  2. Currently no remote backup support
  3. Mitigated by: Future enhancement (S3, B2, etc.)

  4. No Encryption

  5. Environment variables stored in plaintext
  6. Mitigated by: File permissions (600), future encryption support

  7. No Incremental Backups

  8. All backups are full backups
  9. Mitigated by: Good compression, future incremental support

  10. Manual Retention

  11. No automatic cleanup
  12. Mitigated by: Simple delete command, future automated policies

Trade-offs

  1. Directory vs Single Archive
  2. Chose: Directory-based
  3. Trade-off: Slightly more complex to copy (many files vs one)
  4. Benefit: Much easier to inspect and verify

  5. JSON vs Binary Metadata

  6. Chose: JSON
  7. Trade-off: Slightly larger size
  8. Benefit: Human-readable, debuggable

  9. Service-Specific vs Generic Backup

  10. Chose: Service-specific (via Addon)
  11. Trade-off: Each service needs backup implementation
  12. Benefit: Optimal backup format per service (e.g., PostgreSQL dump vs Redis RDB)

Alternatives Considered

Single Archive File

Considered: Store entire backup as one .tar.gz file

Rejected because: - Harder to inspect contents - Must extract everything to verify one file - Checksumming less granular - Harder to implement partial restore (future)

Database-Stored Backups

Considered: Store backup data in PostgreSQL/SQLite

Rejected because: - BLOB storage inefficient - Harder to move/copy backups - Potential database bloat - Backup system should not depend on database

Cloud-First Approach

Considered: Store backups directly in S3/B2

Rejected for initial version because: - Adds complexity and dependencies - Requires configuration (API keys, etc.) - Not all users have cloud access - Can be added as enhancement

Incremental Backups

Considered: Store only changed files since last backup

Rejected for initial version because: - Significantly more complex - Requires reference to previous backup - Harder to verify integrity - Can be added as enhancement

Encrypted Backups

Considered: Encrypt all backup files

Rejected for initial version because: - Adds key management complexity - Not all users need encryption - Can be added as opt-in enhancement

Implementation Notes

Code Organization

  • Core Logic: hop3/core/backup.py - BackupManager class
  • Commands: hop3/commands/backup.py - CLI commands
  • Models: hop3/orm/backup.py - Database schema
  • Config: hop3/config.py - BACKUP_ROOT path

Testing Strategy

  • Unit Tests: BackupManifest, checksums, ID generation
  • Integration Tests: All CLI commands with mocked filesystem
  • System Tests: Real PostgreSQL in Docker
  • E2E (single-instance): round-trip create / list / info / restore / destroy, plus same-instance clone via --target-app.
  • E2E (cross-instance migration): two independent Docker instances paired by a fixture; covers register, restore equivalence (registry / env vars / HTTP body byte-equality), name collisions, cross-instance clone, manifest checksum round-trip, and corrupted-manifest refusal.

Service Integration

Services must implement:

class Addon(Protocol):
    def backup(self) -> Path:
        """Create backup, return path to backup file."""
        ...

    def restore(self, backup_path: Path) -> None:
        """Restore from backup file."""
        ...

PostgreSQL example:

def backup(self) -> Path:
    backup_file = backup_dir / f"{self.addon_name}_{timestamp}.sql"
    subprocess.run([
        "pg_dump", "-h", "localhost",
        "-U", self.db_user, "-d", self.db_name,
        "-f", str(backup_file)
    ], env={"PGPASSWORD": self.db_password})
    return backup_file

Future Enhancements

  1. Automated Backups
  2. Scheduled backups with cron-like syntax
  3. Configurable in hop3.toml
  4. Retention policies with automatic cleanup

  5. Remote Storage

  6. S3, Backblaze B2, Azure Blob support
  7. Pluggable storage backends
  8. Automatic replication

  9. Encryption

  10. Age or GPG encryption
  11. Key management
  12. Optional per-backup or global

  13. Incremental Backups

  14. rsync-based incremental
  15. Hard-link unchanged files
  16. Space-efficient

  17. Verification Scheduler

  18. Periodic checksum verification
  19. Alert on corruption
  20. Automatic re-backup

  21. Backup Browsing

  22. View backup contents without restoring
  23. Extract individual files
  24. Search across backups

References

  • Strategy: ADR 016: Backup Strategy (long-term vision, phases 2-3)
  • Implementation: packages/hop3-server/src/hop3/core/backup.py
  • Commands: packages/hop3-server/src/hop3/commands/backup.py
  • Tests: packages/hop3-server/tests/{a_unit,b_integration,d_e2e}/test_backup*.py
  • User Documentation: docs/src/backup-restore.md
  • Service Protocol: packages/hop3-server/src/hop3/core/protocols.py

Related ADRs: ADR 016: Backup Strategy, ADR 020: Pluggable Architecture for Core Deployment Workflow