How Hop3 is Tested — The Demos: One Artifact, Three Jobs¶
Part 2 of the series How Hop3 is Tested
- How Hop3 is Tested
- The Demos: One Artifact, Three Jobs
- Testable Docs: Tutorials That Run Themselves
- The Test Runner: Why
hop3-testExists - The Test Lab: Making the Nightly Suite Legible
Most projects keep demos and tests in separate worlds. Demos are pretty, hand-curated, and rot quietly; tests are green and unreadable. Hop3 refuses the split. A demo here is three things at once, and is never treated as dead code:
- Teaching — a readable, runnable walk-through of how a feature works.
- Demonstration — what you show in a screencast or to an evaluator.
- Test — run in CI to catch regressions end-to-end.
That decision is recorded in ADR 043 (v0.3): a demo is simultaneously an educational walkthrough, a live demonstration, and a test, so a broken demo is a first-class regression. The demo engine is kept precisely because removing any one of those three jobs would make the others worse.
Capabilities first¶
There's a clean division of labour with the test runner:
Demos showcase capabilities — builders, toolchains, addons, scaling, backups, the CLI surface — each deploying a tiny sample app to exercise one feature. Real third-party applications (WordPress, Gitea, Miniflux) live in the real-apps catalog, packaged in multiple variants with content-checked validations.
A demo's app is deliberately small and boring. What matters is the platform edge it pokes at.
What a demo looks like¶
A demo is a directory under demos/ containing a demo-script.py and a small app/. It is auto-discovered — no registration:
# demos/demoXX/demo-script.py
TITLE = "Demo XX: My Feature"
DESCRIPTION = "One paragraph: what this demonstrates."
APP_NAME = "demoXX"
APP_DIR = Path(__file__).parent / "app"
REQUIRES: list[str] = [] # e.g. ["docker"] — the demo is skipped if unmet
def run(ctx: DemoContext) -> None:
from lib import deploy_app, set_hostname, test_app_via_curl, cleanup_app
host = ctx.get_app_hostname(APP_NAME)
deploy_app(ctx, APP_NAME, APP_DIR) # packs APP_DIR, `hop3 deploy --app …`
set_hostname(ctx, APP_NAME, host)
test_app_via_curl(ctx, f"https://{host}", expected_content="…")
cleanup_app(ctx, APP_NAME, f"https://{host}") # honours --keep
The verification is content-checked. test_app_via_curl asserts that the body contains app-specific text — because a 200 can be a placeholder, an error page, or, memorably, another app's content leaking through a misconfigured proxy. A green status code proves almost nothing on a PaaS.
Running and inspecting them¶
The launcher is a single entry point, demos/demo.py, with two subcommands and two backends:
# Local Docker container — no remote server needed (what CI uses)
python demos/demo.py run --backend docker demo01
# Remote server over SSH
python demos/demo.py run --host <server_ip> demo01
# Test your LOCAL hop3-server changes (rsync the working tree before deploying)
python demos/demo.py run --backend docker --local demo01
# See what's available, with capability tags
python demos/demo.py list -v
Capability tags and feature filters¶
Every demo carries namespaced capability tags, computed from its hop3.toml (builder, toolchain, addons) plus anything the script declares — builder:docker, toolchain:go, addon:postgres, extra:backup. You can slice the suite by them:
python demos/demo.py run --select toolchain:python --skip extra:backup
python demos/demo.py list --select addon:postgres
This makes the demos usable as a targeted probe: "run everything that touches Postgres", "skip the slow backup demos", "only the Go toolchain". --select is AND across flags (OR within a comma-separated value); --skip is OR.
What a run actually does¶
The launcher runs four phases, then summarises pass/fail/skip with timings:
- Prerequisites — reach the target (start the Docker container or SSH in), check the OS, install/update Hop3.
- Configure CLI — create/log in an admin user and point an isolated CLI config at the target, so demos never touch your real
~/.config/hop3-cli. - Run the selected demos — each
run(ctx); a failure in one doesn't stop the others. - Summary — results, durations, and (with
--keep) the admin credentials so you can poke at the live app.
Robustness is the whole point¶
Because a demo is also a test, the launcher has to be boringly reliable — and getting there taught some lessons worth stating, since they're the kind of thing that bites any "run real deployments in a loop" harness:
- Non-interactive by construction. The runner has no human to answer a prompt. Destructive commands (
addon destroy,app destroy) pass-y, and every command runs withstdinclosed — so a command that tries to prompt gets EOF and fails loud. The alternative is a run wedged forever on an invisible "Are you sure?". - Bounded. Every command has a timeout, so a hung RPC becomes a loud, actionable failure within seconds.
- Serialized. Demo runs share an isolated CLI config home and mutate a shared target server, so two runs at once would clobber each other's context and collide on server resources. The launcher takes a machine-wide lock and refuses to start a second run — a confound that otherwise produces baffling, "eventually-consistent" flakiness.
- App-scoped commands take the target as a
--appflag (per ADR 036). The demos are also the first place CLI-ergonomics regressions get caught, because they exercise the command surface the way a user would. - Clean failure output. In quiet mode each demo is one line —
demo10 (PostgreSQL Addon)... FAIL— with the actionable cause and a log pointer underneath. The raw multi-line command dump stays out of the progress flow.
All of this exists because a test harness that deploys real apps to real servers has to fail loudly and legibly when the platform misbehaves — which is the entire reason the demos exist.
How they run in CI¶
The test runner exercises each demo in place: a meta-runner (DemoTestRunner) drives demos/demo.py and surfaces a non-zero exit as a failed test. So the same script you read to learn a feature is the one CI runs to guard it — and make test-demos-docker runs the whole suite against a fresh local container.
Writing a good one¶
If you add a demo, the conventions that matter (learned the hard way) are: import lib inside run() (the launcher sets up sys.path first); always --app <name>; tear down whatever you create so re-runs are reproducible and apps don't coexist by accident; and verify content (a 200 proves almost nothing). Existing demos make good templates — demo01 is the simplest, and there's a broad "CLI surface tour" demo that exercises as much of the command surface as possible in one go.
Part of a five-part series on how Hop3 is tested. See also the test runner that runs the demos in CI, and testable docs with validoc for the documentation-as-test counterpart. The demo strategy is decided in ADR 043 §9.