ADR 023: Runtime Stack Replacement¶
Status: Draft Type: Feature Created: 2024-11-01 Related-ADRs: 021, 036
Introduction¶
This ADR proposes replacing Hop3's current runtime stack (uWSGI + nginx + supervisor) with a modernized, simplified architecture that maintains all functionality while reducing complexity and improving maintainability. The goal is to achieve hot reconfiguration capabilities, eliminate unmaintained dependencies, and provide a cleaner separation of concerns.
Summary¶
We propose replacing the current three-component stack (uWSGI for application serving, nginx for reverse proxy, supervisor for process management) with a streamlined architecture consisting of:
- Granian - Modern Rust-based ASGI/WSGI server (replacing uWSGI)
- Caddy - Modern reverse proxy with automatic HTTPS (replacing nginx)
- Custom Process Manager - Lightweight Python daemon (replacing supervisor)
This new stack will provide hot reconfiguration (add/remove apps without restarts), automatic HTTPS via ACME, and eliminate dependency on unmaintained projects (uWSGI development has ceased).
Context and Goals¶
Context¶
Hop3 currently relies on three major components for its runtime:
- uWSGI - Application server
- Complex C codebase (10x more features than needed)
- Development has ceased (no longer actively maintained)
- Emperor mode provides hot reload capability
-
Supports multiple protocols (WSGI, RWSGI, JWSGI)
-
nginx - Reverse proxy
- Battle-tested and reliable
- Requires reload for configuration changes
- Static configuration files
-
Hop3 generates configs via Python templates
-
Supervisor - Process manager (in test/dev environments)
- Manages multiple processes
- Limited hot reconfiguration
- Requires
supervisorctlcommands for changes
This stack works but has significant drawbacks: - uWSGI is unmaintained - Security and compatibility risks - No hot reconfiguration - Adding apps requires nginx reloads - Complexity - uWSGI has 100+ features, we use ~7 - Tight coupling - Components are interdependent
Goals¶
- Eliminate unmaintained dependencies - Replace uWSGI with actively maintained alternatives
- Enable hot reconfiguration - Add/remove apps, update SSL, change routing without any restarts
- Reduce complexity - Use simpler components that do one thing well
- Maintain functionality - Support all current features (WSGI, Node.js, Ruby, SSL, virtual hosts, etc.)
- Improve developer experience - Simpler architecture = easier to understand and debug
- Automatic HTTPS - Built-in ACME/Let's Encrypt support without external tools
Tenets¶
- Simplicity over features - Use components that provide exactly what we need, no more
- Modern, maintained software - Actively developed projects with strong communities
- Hot reconfiguration everywhere - No restarts required for routine operations
- Clear separation of concerns - Each component has one job
- Convention over configuration - Smart defaults with explicit overrides when needed
Decision¶
We will replace the current runtime stack with:
Option 1: Caddy + Custom Process Manager + Granian (Recommended)
Components: 1. Granian - Rust-based ASGI/WSGI server (app execution layer) 2. Caddy - Reverse proxy with automatic HTTPS (edge layer) 3. Custom Process Manager - Python daemon for lifecycle management (control layer)
Architecture:
┌─────────────────────────────────────────┐
│ Caddy (Reverse Proxy + SSL) │
│ - Automatic HTTPS (ACME) │
│ - Hot reload via JSON API │
│ - Virtual host routing │
└────────────┬────────────────────────────┘
│ Unix sockets
↓
┌─────────────────────────────────────────┐
│ Hop3 Process Manager (Python daemon) │
│ - Spawns/monitors Granian processes │
│ - Health checks & auto-restart │
│ - Lifecycle API (start/stop/reload) │
└────────────┬────────────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ Granian Processes (one per app) │
│ - Runs application code (WSGI/ASGI) │
│ - Unix socket per app │
│ - Process-per-app isolation │
└─────────────────────────────────────────┘
Alternative Option 2: All-Python Stack
Components: 1. Hypercorn/Uvicorn - Python ASGI server (replaces both nginx and Granian) 2. Custom Reverse Proxy - Python ASGI middleware (routing layer) 3. acme library - Python ACME client (SSL/TLS) 4. Same Custom Process Manager - Python daemon
Detailed Design¶
Component Selection Rationale¶
Granian (replacing uWSGI)¶
Why Granian? - ✅ Rust-based - Fast, memory-safe, actively maintained - ✅ Already a dependency - hop3-server uses it - ✅ ASGI/WSGI support - Handles Python apps - ✅ Hot reload - Process-level reload support - ✅ Unix sockets - Native support for nginx-style deployment - ✅ A small fraction of uWSGI's complexity - Does what we need, nothing more
Granian features we'll use: - WSGI/ASGI interface - Unix socket communication - Multi-worker support - Threading support - Process management - Environment variables
What we lose from uWSGI: - Ruby (RWSGI), Java (JWSGI) plugins → Can proxy to standalone Ruby/Java processes - Emperor mode → Replaced by custom process manager - Built-in cron → Use system cron or scheduled tasks - Complex routing → Don't need it
Caddy (replacing nginx)¶
Why Caddy? - ✅ Automatic HTTPS - Built-in ACME, zero configuration - ✅ Hot reload via API - JSON API for config changes without restart - ✅ Single binary - Easy to install and update - ✅ Modern protocols - HTTP/2, HTTP/3, WebSocket support - ✅ Battle-tested - Production-grade, widely deployed - ✅ Simple configuration - JSON or Caddyfile
Caddy API example:
# Add new app instantly (no restart)
curl -X POST http://localhost:2019/config/ \
-H "Content-Type: application/json" \
-d '{
"apps": {
"http": {
"servers": {
"hop3": {
"routes": [{
"match": [{"host": ["newapp.example.com"]}],
"handle": [{
"handler": "reverse_proxy",
"upstreams": [{"dial": "unix//home/hop3/sockets/newapp.sock"}]
}]
}]
}
}
}
}
}'
What we lose from nginx: - Extensive tuning options → Caddy has sensible defaults - nginx-specific modules → Don't use any Hop3-specific ones - Familiarity → Learning curve for Caddy
Custom Process Manager (replacing supervisor)¶
Why custom? - ✅ Hop3-specific - Exactly our needs, no more - ✅ Tight integration - Direct API with hop3-server - ✅ Hot reconfiguration - Full control over lifecycle - ✅ Simple implementation - A small Python daemon - ✅ asyncio-based - Modern Python patterns
Core features:
class ProcessManager:
async def start_app(self, app_name: str, **config):
"""Start Granian for an app."""
async def stop_app(self, app_name: str, timeout: int = 30):
"""Gracefully stop an app."""
async def reload_app(self, app_name: str):
"""Zero-downtime reload."""
async def scale_app(self, app_name: str, workers: int):
"""Scale workers."""
async def get_status(self, app_name: str) -> dict:
"""Get app health status."""
What we lose from supervisor: - GUI (supervisorctl web interface) → Use hop3 CLI/API - Generic process management → Hop3-specific instead
Implementation Architecture¶
packages/hop3-server/src/hop3/
├── run/
│ ├── pm/ # New: Process Manager
│ │ ├── __init__.py
│ │ ├── manager.py # ProcessManager class
│ │ ├── process.py # AppProcess wrapper
│ │ └── monitor.py # Health monitoring
│ └── spawn.py # Updated: Use Granian instead of uWSGI
├── proxy/
│ ├── caddy/ # New: Caddy integration
│ │ ├── __init__.py
│ │ ├── api.py # Caddy JSON API client
│ │ ├── config.py # Config generation
│ │ └── acme.py # ACME setup
│ └── nginx/ # Deprecated: Keep for migration
└── plugins/
└── deploy/
├── granian/ # New: Granian deployer
│ ├── deployer.py
│ └── plugin.py
└── uwsgi/ # Deprecated: Keep for migration
Deployment Workflow¶
Current (uWSGI + nginx):
1. Deploy app
2. Generate .ini files → /home/hop3/uwsgi-enabled/
3. Emperor detects new files, spawns workers
4. Generate nginx config → /home/hop3/nginx/app.conf
5. Reload nginx (supervisorctl restart nginx)
6. App is live
Proposed (Granian + Caddy):
1. Deploy app
2. ProcessManager.start_app()
- Spawn Granian on unix socket
- Monitor health
3. CaddyAPI.add_route()
- Hot add route via JSON API
- Caddy obtains SSL cert automatically
- No restart
4. App is live
Migration Strategy¶
Phase 1: Parallel Implementation - Implement ProcessManager - Implement Caddy integration - Add Granian deployer plugin - Keep existing uWSGI/nginx code
Phase 2: Feature Parity - Test all app types (Python, Node.js, Ruby, etc.) - Verify SSL/ACME functionality - Performance testing - Documentation
Phase 3: Migration - Update installer to include Caddy - Default to new stack for new installs - Provide migration script for existing deployments - Update all documentation
Phase 4: Deprecation - Mark uWSGI/nginx code as deprecated - Remove in next major version
Examples and Interactions¶
Example 1: Deploy a Python Flask App¶
With new stack:
# User runs: hop3 deploy myapp
# 1. Build (unchanged)
builder = PythonBuilder(context)
artifact = builder.build() # Creates venv
# 2. Deploy
deployer = GranianDeployer(context, artifact)
deployment_info = deployer.deploy()
# → ProcessManager spawns Granian on unix:/home/hop3/sockets/myapp.sock
# 3. Proxy
caddy_proxy = CaddyProxy()
caddy_proxy.configure(deployment_info)
# → Caddy API: add route for myapp.example.com → unix socket
# → Caddy obtains Let's Encrypt cert automatically
# Done! App is live with HTTPS
Example 2: Scale Workers¶
# User runs: hop3 scale myapp web=4
# ProcessManager
await pm.scale_app("myapp", workers=4)
# → Graceful reload with new worker count
# → No downtime
# → Caddy continues routing during reload
Example 3: Add New App (Hot Reconfiguration)¶
# User runs: hop3 deploy newapp
# No restarts anywhere!
# 1. ProcessManager starts new Granian
# 2. Caddy API adds route
# 3. ACME obtains SSL cert
# 4. Done - both apps running simultaneously
Example 4: Non-Python App (Node.js)¶
# For non-WSGI apps, ProcessManager runs them directly
await pm.start_app(
app_name="nodeapp",
command=["node", "server.js"],
socket_path="/home/hop3/sockets/nodeapp.sock",
env={"PORT": "8000"}
)
# Caddy proxies to the socket (app must listen on unix socket)
# Or use HTTP if app doesn't support sockets
Consequences¶
Benefits¶
- Simplified Stack
-
Three components with clear roles, each substantially smaller and more focused than its predecessor: Granian (Rust) against uWSGI's large C codebase, and a small custom process manager against supervisor.
-
Hot Reconfiguration
- Add apps: instant (Caddy API + PM start)
- Remove apps: instant (PM stop + Caddy API)
- Update SSL: automatic (Caddy ACME)
- Scale workers: graceful reload (PM)
-
Zero downtime for routine operations
-
Automatic HTTPS
- Caddy handles ACME challenges
- Auto-renewal
- No certbot, no custom scripts
-
Works out of the box
-
Modern, Maintained
- Granian: Active Rust project
- Caddy: Active Go project
-
Both have strong communities
-
Better Developer Experience
- Simpler to understand (less code, clearer roles)
- Easier to debug (fewer layers)
- Better error messages
-
API-driven configuration
-
Performance
- Granian is Rust (faster than uWSGI Python plugin)
- Caddy is Go (comparable to nginx)
- Lower memory usage (no emperor overhead)
Drawbacks¶
- New Dependencies
- Caddy binary must be installed
- Learning curve for team
-
Different configuration paradigm
-
Migration Effort
- Substantial implementation and testing effort
- Existing deployments need migration
- Documentation updates
-
Training/onboarding
-
Less Proven (for Hop3)
- uWSGI + nginx is battle-tested in Hop3
- New stack needs validation
-
Potential unknown issues
-
Plugin Ecosystem
- uWSGI has Ruby/Java plugins
- Granian is Python-focused
-
Workaround: Run Ruby/Node/etc. processes directly
-
Operational Changes
- Different debugging approaches
- Different monitoring/logging
-
New failure modes to learn
-
All-in on Caddy
- If Caddy has issues, affects all apps
- nginx is extremely stable
- Caddy is newer (though mature)
Lessons Learned¶
From previous experience with uWSGI: - Emperor mode's hot reload is the "killer feature" - must replicate - Simple config files (touch to reload) worked well - keep this pattern - Most uWSGI features are unused - validate minimal feature set - Process isolation is important - maintain one-process-per-app
From nginx experience: - Static config files + reload is acceptable but suboptimal - Virtual host routing is essential - Unix sockets work better than TCP for local communication - SSL automation is critical (certbot was complex)
Alternatives¶
Alternative 1: Keep Current Stack + Patch uWSGI¶
Description: Continue using uWSGI + nginx, potentially forking uWSGI if critical bugs arise.
Pros: - Zero migration effort - Known quantity - Existing documentation
Cons: - uWSGI is unmaintained (security risk) - No hot reconfiguration - Technical debt accumulates
Verdict: Rejected - kicks the can down the road, doesn't solve core issues.
Alternative 2: Traefik Instead of Caddy¶
Description: Use Traefik for reverse proxy instead of Caddy.
Pros: - Service discovery built-in - More enterprise features - Metrics/observability
Cons: - More complex than Caddy - Heavier resource usage - More verbose configuration
Verdict: Viable alternative, but Caddy's simplicity aligns better with Hop3's philosophy.
Alternative 3: All-Python Stack (Hypercorn + Custom Proxy)¶
Description: Build everything in Python - reverse proxy, SSL management, process management.
Pros: - No external dependencies - Full control - Tight integration
Cons: - Substantial code to implement properly - ACME is complex - Proxy performance < Caddy - Reinventing the wheel
Verdict: Interesting for learning but not practical - too much effort for marginal benefit.
Alternative 4: systemd Only (No supervisor/PM)¶
Description: Use systemd templates for process management instead of custom PM.
Pros: - Standard Linux approach - Battle-tested - No custom code
Cons: - Linux-only (not macOS/BSD) - Less flexible than programmatic API - Harder to integrate with hop3-server
Verdict: Good for production, but custom PM provides better development experience and cross-platform support.
Alternative 5: Keep nginx, Replace Only uWSGI¶
Description: Minimal change - just replace uWSGI with Granian, keep nginx.
Pros: - Smaller migration - Keep proven nginx - Lower risk
Cons: - No hot reconfiguration - No automatic HTTPS - Doesn't solve all problems
Verdict: Possible incremental approach, but doesn't achieve goal of hot reconfiguration.
Prior Art¶
Heroku¶
- Uses nginx + custom routing layer
- Dynamic routing without restarts
- Automatic SSL
Fly.io¶
- Uses Caddy-like approach
- Hot configuration updates
- Built-in SSL
Railway¶
- Uses Caddy
- Automatic HTTPS
- Simple deployment model
Dokku¶
- Uses nginx + custom scripts
- Process management via systemd
- Static configuration (requires reload)
CapRover¶
- Uses nginx + Docker
- Let's Encrypt integration
- Web UI for management
Common patterns: - Modern PaaS platforms use hot reconfiguration - Automatic SSL is table stakes - Simplicity over features - API-driven configuration
Unresolved Questions¶
- Caddy License
-
Apache 2.0 - compatible with Hop3's license ✓
-
Granian Maturity
- How stable is Granian for production?
-
Need to validate with load testing
-
Migration Path
- How to migrate existing deployments without downtime?
-
Can we run old and new stack simultaneously?
-
Performance
- Will Granian match uWSGI performance?
-
Need benchmarks
-
Non-Python Apps
- How to handle Ruby/Node.js/Go apps elegantly?
-
Proxy to standalone processes? Direct execution?
-
Monitoring
- How to expose process manager metrics?
- Integration with existing monitoring tools?
Future Work¶
- Advanced Features
- Load balancing across multiple Granian instances
- Blue-green deployments
- Canary deployments
-
A/B testing support
-
Observability
- Metrics collection from ProcessManager
- Distributed tracing
-
Centralized logging
-
High Availability
- Multi-server deployments
- Shared state management
-
Failover mechanisms
-
Developer Experience
- Local development mode without Caddy
- Better error messages
-
Interactive debugging tools
-
Ecosystem
- Alternative process managers (systemd plugin?)
- Alternative proxies (Traefik plugin?)
- Monitoring integrations
Related¶
- ADR-020: Pluggable Architecture - This proposal fits within the plugin architecture, with Granian and Caddy as new plugins
- ADR-010: Security and Resilience - ACME integration improves security posture
- ADR-002: Config Format - hop3.toml can specify runtime preferences
References¶
- Granian Documentation
- Caddy Documentation
- Caddy JSON Config API
- uWSGI Emperor Mode
- ACME Protocol (RFC 8555)
- Let's Encrypt Documentation
Appendix¶
A. Current uWSGI Features Actually Used¶
From analysis of hop3/run/uwsgi/worker.py:
Core features: - Master process - Process management (workers, threads) - Environment variables - Virtualenv support - Unix sockets - Logging with rotation - Idle timeout
Worker types: - WSGI (Python) - plugin=python3 - RWSGI (Ruby) - plugin=rack - JWSGI (Java) - plugin=jvm - Web (attach-daemon) - generic processes - Cron - scheduled tasks
NOT used: - Static file serving (nginx does this) - Caching (nginx does this) - Load balancing - Clustering - Legion subsystem - Advanced routing - ~90+ other features
B. ProcessManager Prototype¶
import asyncio
import signal
from pathlib import Path
from dataclasses import dataclass
@dataclass
class AppProcess:
process: asyncio.subprocess.Process
app_name: str
socket_path: Path
workers: int
class ProcessManager:
def __init__(self):
self.processes: dict[str, AppProcess] = {}
self._running = False
async def start(self):
"""Start the process manager daemon."""
self._running = True
asyncio.create_task(self._monitor_loop())
async def start_app(
self,
app_name: str,
wsgi_module: str,
workers: int = 4,
threads: int = 4,
cwd: Path = None,
env: dict = None,
):
"""Start Granian for an app."""
socket_path = Path(f"/home/hop3/sockets/{app_name}.sock")
socket_path.parent.mkdir(parents=True, exist_ok=True)
socket_path.unlink(missing_ok=True)
cmd = [
"granian",
"--interface", "wsgi",
"--host", f"unix:{socket_path}",
"--workers", str(workers),
"--threads", str(threads),
f"{wsgi_module}:application"
]
proc = await asyncio.create_subprocess_exec(
*cmd,
cwd=cwd,
env=env,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
self.processes[app_name] = AppProcess(
process=proc,
app_name=app_name,
socket_path=socket_path,
workers=workers
)
# Wait for socket to exist
for _ in range(30):
if socket_path.exists():
return
await asyncio.sleep(0.1)
raise TimeoutError(f"Socket {socket_path} never appeared")
async def stop_app(self, app_name: str, timeout: int = 30):
"""Gracefully stop an app."""
if app_proc := self.processes.get(app_name):
app_proc.process.send_signal(signal.SIGTERM)
try:
await asyncio.wait_for(
app_proc.process.wait(),
timeout=timeout
)
except asyncio.TimeoutError:
app_proc.process.kill()
await app_proc.process.wait()
finally:
del self.processes[app_name]
app_proc.socket_path.unlink(missing_ok=True)
async def reload_app(self, app_name: str):
"""Zero-downtime reload (future work)."""
# Start new process on temp socket
# Update Caddy to point to new socket
# Wait for connections to drain
# Kill old process
pass
async def scale_app(self, app_name: str, workers: int):
"""Scale workers by restarting with new count."""
if app_proc := self.processes.get(app_name):
# Stop old process
await self.stop_app(app_name)
# Start new with updated worker count
# (Would need to store original config)
async def _monitor_loop(self):
"""Health monitoring and auto-restart."""
while self._running:
for app_name, app_proc in list(self.processes.items()):
if app_proc.process.returncode is not None:
print(f"App {app_name} died with code {app_proc.process.returncode}")
# Auto-restart logic would go here
await asyncio.sleep(5)
C. Caddy Configuration Example¶
Caddyfile format:
{
admin localhost:2019
auto_https on
}
myapp.example.com {
reverse_proxy unix//home/hop3/sockets/myapp.sock
}
otherapp.example.com {
reverse_proxy unix//home/hop3/sockets/otherapp.sock
}
JSON API format:
{
"apps": {
"http": {
"servers": {
"hop3": {
"listen": [":80", ":443"],
"routes": [
{
"match": [{"host": ["myapp.example.com"]}],
"handle": [
{
"handler": "reverse_proxy",
"upstreams": [
{"dial": "unix//home/hop3/sockets/myapp.sock"}
]
}
]
}
]
}
}
},
"tls": {
"automation": {
"policies": [
{
"issuers": [{"module": "acme"}]
}
]
}
}
}
}
D. Performance Comparison Target¶
Metrics to benchmark: - Requests/second (should be ≥ uWSGI) - Latency p50, p95, p99 (should be ≤ uWSGI) - Memory usage (target: ≤ uWSGI) - CPU usage (target: ≤ uWSGI) - Time to start app (target: ≤ uWSGI) - Time to reload app (target: < uWSGI)
Test scenarios: 1. Single app, low load (100 req/s) 2. Single app, high load (1000 req/s) 3. Multiple apps (10 apps, 100 req/s each) 4. WebSocket connections 5. Large request bodies 6. Static file serving (if applicable)
Related ADRs: ADR 021: Proxy Plugin System for Reverse Proxy Configuration, ADR 036: CLI Ergonomics and Command Surface