ADR 024: Backup and Restore System¶

Status: Final Type: Feature Created: 2025-11-08 Related-ADRs: 016, 020

Relationship to ADR 016¶

This ADR specifies the foundational implementation of Hop3's backup system. ADR 016 defines the long-term backup strategy, including features that build on this foundation (automated scheduling, remote storage, encryption, incremental backups). This ADR focuses on the file-based core that enables those enhancements.

Context¶

Hop3 needs a comprehensive backup and restore system to protect user applications and data. This is essential for:

Disaster Recovery: Quickly recover from server failures, data corruption, or accidental deletions
Deployment Safety: Allow rollback to previous versions if deployments fail
Application Cloning: Enable creating staging/test environments from production
Migration: Facilitate moving applications between servers
User Confidence: Give users peace of mind that their data is protected

The backup system must be: - Complete: Capture all necessary data (code, data, config, services) - Reliable: Ensure data integrity with verification - Simple: Easy to use via CLI commands - Efficient: Minimize storage use and backup time - Extensible: Support future enhancements (encryption, remote storage, etc.)

Decision¶

Hop3 uses a file-based backup system with the following design:

Backup Format¶

Each backup is stored as a directory containing:

/home/hop3/backups/apps/<app-name>/<backup-id>/
├── metadata.json         # Backup manifest with checksums
├── source.tar.gz        # Source tree (src/) + bare git repo (git/)
├── data.tar.gz          # Application data archive
├── env.json             # Environment variables (JSON)
└── addons/              # Per-addon backups (e.g. postgres dumps)
    └── postgres_<name>.sql

Path is HopConfig.BACKUP_ROOT (defaults to HOP3_ROOT/backups). source.tar.gz archives both the deployed working copy (src/) and the bare git repo (git/) so backups remain meaningful for both deploy paths Hop3 supports — git-push (populates the bare repo) and the JSON-RPC tarball API (writes directly to src/).

Key Design Choices¶

Directory-Based Storage
Each backup is a self-contained directory
Easy to inspect, verify, and manage manually if needed
Simplifies integrity checking (each file has independent checksum)
Alternative considered: Single archive file (rejected - harder to inspect/verify)
Tar.gz Compression
Standard, well-supported format
Good compression ratio (typically 50-80%)
Fast compression/decompression
Can stream large files without loading into memory
Alternative considered: zip (rejected - less efficient), xz (rejected - slower)
JSON Metadata
Human-readable and inspectable
Standard format with excellent tooling
Easy to parse and validate
Contains complete inventory with checksums
Alternative considered: Binary format (rejected - not human-readable)
SHA256 Checksums
Industry-standard cryptographic hash
Detects any file corruption or tampering
Fast to compute
Stored in metadata.json for each file
Alternative considered: MD5 (rejected - cryptographically broken), SHA512 (rejected - overkill)
Service Plugin Integration
Leverages existing Addon protocol
Each service implements backup() and restore() methods
Service-specific backup format (e.g., PostgreSQL uses pg_dump)
Extensible: new services automatically support backup
Alternative considered: Generic service backup (rejected - loses service-specific optimizations)
Unique Backup IDs
Format: YYYYMMDD_HHMMSS_<random-6-chars>
Sortable by creation time
Collision-resistant (random suffix)
Human-readable timestamp
Alternative considered: UUID (rejected - not human-friendly), sequential numbers (rejected - not globally unique)

Metadata Schema¶

The metadata.json includes:

{
  "backup_id": "20251108_143022_a8f3d9",
  "app_name": "my-app",
  "created_at": "2025-11-08T14:30:22Z",
  "format_version": "1.0",
  "hop3_version": "0.8.0",
  "size_bytes": 15728640,
  "checksums": {
    "source.tar.gz": "sha256:abc123...",
    "data.tar.gz": "sha256:def456...",
    "env.json": "sha256:ghi789..."
  },
  "app_metadata": {
    "hostname": "myapp.example.com",
    "port": 8000,
    "run_state": "RUNNING"
  },
  "addons": [
    {
      "type": "postgres",
      "name": "my-database",
      "backup_file": "addons/postgres_my-database.sql",
      "size_bytes": 5242880,
      "checksum": "sha256:jkl012..."
    }
  ],
  "env_vars_count": 12,
  "expires_after": 0
}

Database Integration¶

Backups are tracked in the database via the existing Backup model:

class Backup(BigIntAuditBase):
    app_id: int
    state: BackupStateEnum  # SCHEDULED/STARTED/COMPLETED/FAILED
    remote_path: str        # Path to backup directory
    size: int              # Total size in bytes
    expires_after: int     # Retention time (0 = never)

This provides: - State tracking for backup operations - Integration with Hop3's audit trail - Future support for scheduled backups - Retention policy enforcement (future)

Restore Behaviour¶

hop3 backup restore <id> repopulates source / data / env / addons and invokes the build+spawn pipeline at the end. After the command returns, the app is running again — equivalent to its pre-backup state. This matters for cross-instance restore on a fresh host, where there is no prior build state to reuse.

Pass --target-app <new-name> to restore as a clone alongside the original, instead of in-place.

Cross-Instance Migration¶

Backups are portable across Hop3 instances. The operator workflow:

On A: hop3 backup create <app> produces a directory under BACKUP_ROOT/apps/<app>/<id>/.
Transport: copy that directory to instance B (e.g. scp -r).
On B: hop3 backup register <path> reads the manifest, ensures an app row exists for the original app name, and inserts a Backup row pointing at the directory — making it findable by restore.
On B: hop3 backup restore <id> (or ... --target-app NAME to restore under a different name).

backup register is idempotent and verifies the manifest checksums before registering — a corrupted backup is rejected with a clear error rather than letting restore fail later with a less actionable message. Without registration, the destination's restore_backup DB lookup misses the transferred files entirely.

Consequences¶

Positive¶

Simple and Transparent
Users can inspect backups with standard tools
Easy to debug issues
No proprietary formats
Reliable
SHA256 checksums ensure integrity
Atomic operations prevent partial backups
Verification before restore
Complete
Captures all application components
Includes service data
Preserves environment variables
Extensible
Easy to add new backup targets
Service plugins handle service-specific logic
Metadata format supports versioning
Efficient
Compression reduces storage
Streaming for large files
No unnecessary copies

Negative¶

Local Storage Only
Currently no remote backup support
Mitigated by: Future enhancement (S3, B2, etc.)
No Encryption
Environment variables stored in plaintext
Mitigated by: File permissions (600), future encryption support
No Incremental Backups
All backups are full backups
Mitigated by: Good compression, future incremental support
Manual Retention
No automatic cleanup
Mitigated by: Simple delete command, future automated policies

Trade-offs¶

Directory vs Single Archive
Chose: Directory-based
Trade-off: Slightly more complex to copy (many files vs one)
Benefit: Much easier to inspect and verify
JSON vs Binary Metadata
Chose: JSON
Trade-off: Slightly larger size
Benefit: Human-readable, debuggable
Service-Specific vs Generic Backup
Chose: Service-specific (via Addon)
Trade-off: Each service needs backup implementation
Benefit: Optimal backup format per service (e.g., PostgreSQL dump vs Redis RDB)

Alternatives Considered¶

Single Archive File¶

Considered: Store entire backup as one .tar.gz file

Rejected because: - Harder to inspect contents - Must extract everything to verify one file - Checksumming less granular - Harder to implement partial restore (future)

Database-Stored Backups¶

Considered: Store backup data in PostgreSQL/SQLite

Rejected because: - BLOB storage inefficient - Harder to move/copy backups - Potential database bloat - Backup system should not depend on database

Cloud-First Approach¶

Considered: Store backups directly in S3/B2

Rejected for initial version because: - Adds complexity and dependencies - Requires configuration (API keys, etc.) - Not all users have cloud access - Can be added as enhancement

Incremental Backups¶

Considered: Store only changed files since last backup

Rejected for initial version because: - Significantly more complex - Requires reference to previous backup - Harder to verify integrity - Can be added as enhancement

Encrypted Backups¶

Considered: Encrypt all backup files

Rejected for initial version because: - Adds key management complexity - Not all users need encryption - Can be added as opt-in enhancement

Implementation Notes¶

Code Organization¶

Core Logic: hop3/core/backup.py - BackupManager class
Commands: hop3/commands/backup.py - CLI commands
Models: hop3/orm/backup.py - Database schema
Config: hop3/config.py - BACKUP_ROOT path

Testing Strategy¶

Unit Tests: BackupManifest, checksums, ID generation
Integration Tests: All CLI commands with mocked filesystem
System Tests: Real PostgreSQL in Docker
E2E (single-instance): round-trip create / list / info / restore / destroy, plus same-instance clone via --target-app.
E2E (cross-instance migration): two independent Docker instances paired by a fixture; covers register, restore equivalence (registry / env vars / HTTP body byte-equality), name collisions, cross-instance clone, manifest checksum round-trip, and corrupted-manifest refusal.

Service Integration¶

Services must implement:

class Addon(Protocol):
    def backup(self) -> Path:
        """Create backup, return path to backup file."""
        ...

    def restore(self, backup_path: Path) -> None:
        """Restore from backup file."""
        ...

PostgreSQL example:

def backup(self) -> Path:
    backup_file = backup_dir / f"{self.addon_name}_{timestamp}.sql"
    subprocess.run([
        "pg_dump", "-h", "localhost",
        "-U", self.db_user, "-d", self.db_name,
        "-f", str(backup_file)
    ], env={"PGPASSWORD": self.db_password})
    return backup_file

Future Enhancements¶

Automated Backups
Scheduled backups with cron-like syntax
Configurable in hop3.toml
Retention policies with automatic cleanup
Remote Storage
S3, Backblaze B2, Azure Blob support
Pluggable storage backends
Automatic replication
Encryption
Age or GPG encryption
Key management
Optional per-backup or global
Incremental Backups
rsync-based incremental
Hard-link unchanged files
Space-efficient
Verification Scheduler
Periodic checksum verification
Alert on corruption
Automatic re-backup
Backup Browsing
View backup contents without restoring
Extract individual files
Search across backups

References¶

Strategy: ADR 016: Backup Strategy (long-term vision, phases 2-3)
Implementation: packages/hop3-server/src/hop3/core/backup.py
Commands: packages/hop3-server/src/hop3/commands/backup.py
Tests: packages/hop3-server/tests/{a_unit,b_integration,d_e2e}/test_backup*.py
User Documentation: docs/src/backup-restore.md
Service Protocol: packages/hop3-server/src/hop3/core/protocols.py