ADR 026: Dashboard UI Test Classification¶

Status: Superseded Type: Guideline Created: 2025-11-20 Superseded-By: ADR 043 Related-ADRs: 020, 024, 043

Superseding Context¶

This guideline relied on a four-layer test pyramid (a_unit/b_integration/c_system/d_e2e) and classified tests by whether their dependencies were real or mocked. ADR 043 replaces that pyramid with three layers (a_unit/b_integration/c_e2e) and dissolves c_system. ADR 043 classifies tests by whether they need Docker, root, or host-mutation rather than by real-vs-mocked dependencies. Under that rule the dashboard file-system tests are hermetic — they run a real App.create() in tmp_path with no root and no Docker — and therefore belong in b_integration, reversing the placement decision recorded here.

Introduction¶

This ADR addresses the question of how to properly classify and implement tests for the Hop3 dashboard web UI, specifically focusing on where the boundary lies between integration tests and system tests when testing web application features.

Summary¶

Dashboard UI tests that involve file system operations belong in system tests (c_system/) with a real App.create() implementation, rather than as integration tests with mocked file operations.

Context and Goals¶

Context¶

The dashboard app creation feature (/dashboard/apps/new) is exercised by tests that verify: - Form rendering and validation - App creation with database persistence

A mocked-integration arrangement (packages/hop3-server/tests/b_integration/test_dashboard_app_create.py) takes this shape:

@pytest.fixture
def test_client(tmp_path: Path, monkeypatch):
    # Patches HOP3_ROOT for database location
    monkeypatch.setattr(hop3.config, "HOP3_ROOT", tmp_path)

    # Mocks App.create() to avoid real file system operations
    def mock_app_create(self):
        app_path = tmp_path / "apps" / self.name
        app_path.mkdir(exist_ok=True)
        # Creates subdirectories...

    monkeypatch.setattr(App, "create", mock_app_create)
    monkeypatch.setattr(SessionAuthBackend, "authenticate", mock_authenticate)

What this arrangement verifies: - ✅ Real Starlette HTTP request/response cycle - ✅ Real route handlers (@router.get, @router.post) - ✅ Real Jinja2 template rendering - ✅ Real SQLAlchemy database operations (SQLite in tmp_path) - ✅ Real form validation logic - ❌ Mocked App.create() method (file system operations) - ❌ Mocked authentication

Goals¶

Maintain fast feedback loops - Tests should run quickly during development
Ensure adequate coverage - Critical business logic must be tested with real implementations
Follow testing pyramid principles - Clear separation between test layers
Avoid test brittleness - Tests shouldn't be overly complex or fragile
Document clear guidelines - Future developers should know where to place tests

Tenets¶

From Hop3's testing strategy (docs/src/dev/testing-strategy.md):

Unit Tests (a_unit/) - Individual functions/classes in isolation, mock all dependencies
Integration Tests (b_integration/) - Multiple components within subsystems, without external dependencies
System Tests (c_system/) - Full application with real dependencies (databases, file systems)
E2E Tests (d_e2e/) - Complete workflows in Docker containers

Key principle from the documentation:

Integration Tests: "Test multiple components working together within subsystems. Uses real database (in-memory SQLite). No external network dependencies."

The question: Is the file system an "external dependency" or part of the subsystem under test?

Decision¶

DECISION: Option 2 - System Tests

Dashboard UI tests for the app creation feature live in c_system/ and use the real App.create() implementation without mocks, rather than in b_integration/ with App.create() mocked.

Rationale: 1. App.create() is core business logic that creates the application's directory structure 2. The performance overhead is negligible 3. Removes mock maintenance burden - tests use real code 4. Better bug detection - catches issues a mock would hide (e.g. a log vs logs directory-name discrepancy) 5. Clearer semantics - "system test" clearly means "full stack with real dependencies"

What's tested: - Real HTTP request/response cycle (Starlette) - Real route handlers and form validation - Real database operations (SQLAlchemy + SQLite) - Real template rendering (Jinja2) - Real file system operations (App.create() directory creation) - Only authentication is mocked for test convenience

Option 1: Integration Tests with Mocks¶

Decision: Dashboard UI tests remain in b_integration/ with App.create() mocked.

Rationale: - The file system is considered an "external dependency" like network I/O - Integration tests focus on web framework + database integration - File system operations are implementation details of the domain layer - Faster test execution (no real file operations)

Test Structure:

# Location: packages/hop3-server/tests/b_integration/test_dashboard_app_create.py

@pytest.fixture
def test_client(tmp_path: Path, monkeypatch):
    monkeypatch.setattr(hop3.config, "HOP3_ROOT", tmp_path)
    monkeypatch.setattr(App, "create", mock_app_create)  # MOCKED
    # Real database, real web framework, real templates

What gets tested: - HTTP routing and request handling - Form validation logic - Template rendering - Database CRUD operations - Session management - Response redirects and status codes

What gets mocked: - File system operations (App.create()) - Authentication (SessionAuthBackend.authenticate)

Option 2: Move to System Tests (Full Integration)¶

Decision: Move dashboard UI tests to c_system/ and remove mocks.

Rationale: - App.create() is core business logic, not just I/O - File system operations are part of the application's contract - System tests should verify the full stack including persistence - The file system is not truly "external" - it's a primary storage mechanism

Test Structure:

# Location: packages/hop3-server/tests/c_system/test_dashboard_app_create.py

@pytest.fixture
def test_client(tmp_path: Path, monkeypatch):
    # Configure full test environment
    monkeypatch.setattr(hop3.config, "HOP3_ROOT", tmp_path)
    monkeypatch.setattr(hop3.orm.app.c, "HOP3_ROOT", tmp_path)

    # NO MOCKS - use real App.create() implementation
    # Real database, real file system, real web framework

What gets tested: - Everything from Option 1, plus: - Real App.create() directory structure creation - Real file system operations - Integration of domain logic with persistence layer

What gets mocked: - Only authentication (for test convenience) - External network calls (if any)

Detailed Design¶

Option 1 Implementation¶

File: packages/hop3-server/tests/b_integration/test_dashboard_app_create.py

Test Characteristics: - Speed: Fastest — no disk I/O - Isolation: High - no file system side effects - Maintainability: Requires maintaining mock implementation parallel to real code - Coverage: Web layer + database layer only

Mock Implementation:

def mock_app_create(self):
    """Simplified mock that just creates directories."""
    app_path = tmp_path / "apps" / self.name
    app_path.mkdir(exist_ok=True)

    for subdir in ["git", "src", "data", "logs"]:
        (app_path / subdir).mkdir(exist_ok=True)

Risk: If App.create() evolves (e.g., creates additional files, sets permissions, initializes git repo), the mock diverges from reality.

Option 2 Implementation¶

File: packages/hop3-server/tests/c_system/test_dashboard_app_create.py

Test Characteristics: - Speed: Slightly slower due to real I/O - Isolation: Medium - creates real directories in tmp_path - Maintainability: No mock to maintain, tests use real code - Coverage: Full stack including file system layer

Configuration Required:

@pytest.fixture
def test_client(tmp_path: Path, monkeypatch):
    # Must patch ALL locations where HOP3_ROOT is imported
    import hop3.config
    import hop3.orm.app

    monkeypatch.setattr(hop3.config, "HOP3_ROOT", tmp_path)
    monkeypatch.setattr(hop3.orm.app.c, "HOP3_ROOT", tmp_path)
    monkeypatch.setattr(hop3.config, "APP_ROOT", tmp_path / "apps")

    # Create required directories
    (tmp_path / "apps").mkdir(exist_ok=True)
    # ... other setup

    # NO App.create() mock - use real implementation

Benefit: Tests verify the actual behavior that users will experience.

Examples and Interactions¶

Example Test: App Creation Success¶

Option 1 (Integration with Mock):

def test_app_create_success(test_client, tmp_path):
    response = test_client.post("/dashboard/apps/new", data={
        "app_name": "test-app",
        "builder": "python",
        "env_vars": "DEBUG=true"
    })

    assert response.status_code == 303
    assert response.headers["location"] == "/dashboard/apps/test-app?created=true"

    # Database verification
    with get_session() as session:
        app = session.query(App).filter_by(name="test-app").first()
        assert app is not None
        # ⚠️ File system NOT verified - mock was called instead

Option 2 (System without Mock):

name="__codelineno-6-1" href="#__codelineno-6-1">def test_app_create_success(test_client, tmp_path): response = test_client.post("/dashboard/apps/new", data={ "app_name": "test-app", "builder": "python", "env_vars": "DEBUG=true" }) assert response.status_code == 303 # Database verification (same as before) with get_session() as session: app = session.query(App).filter_by(name="test-app").first() assert app is not None # ✅ File system verification (NEW) app_path = tmp_path / "apps" / "test-app" assert app_path.exists() assert (app_path / "src").exists() assert (app_path / "data").exists() assert (app_path / "logs").exists() assert (app_path / "git").exists()

Example Test Flow Comparison¶

Option 1 (Integration):

User Request → Starlette → Route Handler → Form Validation
              ↓
          Database Save (SQLite, real)
              ↓
          App.create() [MOCKED - just mkdir]
              ↓
          HTTP Redirect

Option 2 (System):

User Request → Starlette → Route Handler → Form Validation
              ↓
          Database Save (SQLite, real)
              ↓
          App.create() [REAL - creates full directory structure]
              ↓
          HTTP Redirect

Consequences¶

Option 1: Integration Tests with Mocks¶

Benefits¶

Fast Execution: No disk I/O overhead
No Side Effects: Tests don't leave artifacts in file system
Easy Setup: Minimal fixture configuration required
Focused Scope: Tests specifically target web layer concerns
Parallel Execution: Can run multiple tests simultaneously without file conflicts

Drawbacks¶

Mock Maintenance: Must keep mock_app_create() in sync with real implementation
Limited Coverage: Doesn't test actual App.create() behavior
False Confidence: Tests might pass while real code has bugs in file operations
Divergence Risk: If App.create() adds logic (permissions, git init, etc.), mock won't catch it
Unclear Semantics: "Integration without file system" is ambiguous - where's the boundary?

Option 2: System Tests without Mocks¶

Benefits¶

Full Coverage: Tests actual behavior users will experience
No Mock Maintenance: Tests use real code, auto-updated when implementation changes
Catch Real Bugs: Will detect issues like permission errors, path problems, etc.
Clear Semantics: "System test" clearly means "full stack with real dependencies"
Better Confidence: Tests verify the complete integration

Drawbacks¶

Slightly Slower: Real I/O adds overhead, though it remains fast
More Complex Setup: Need to patch multiple config locations
Potential Brittleness: Tests depend on more moving parts
Cleanup Required: Must ensure tmp_path cleanup works properly
Less Isolation: File system state could theoretically affect tests (mitigated by tmp_path)

Lessons Learned¶

From the Mocked Arrangement¶

Monkeypatching is Tricky: Patching HOP3_ROOT requires finding all import locations
Module-Level Imports: Config values imported at module level are hard to mock
Test Classification Matters: The choice affects where tests live and what they verify
Authentication Mock is Universal: Both options need to mock auth for convenience

From Testing Strategy Document¶

The testing strategy says:

Integration Tests: "Uses real database (in-memory SQLite). No external network dependencies."

Note: It says "network dependencies" - not "file system dependencies". This is ambiguous.

From Test Development Process¶

Mocking App.create() and HOP3_ROOT is awkward: environment-level patching and importlib.reload() do not work, and only monkeypatching multiple import locations plus the App.create() method makes the mocked tests pass. The system fights against mocking, which suggests that Option 2 (system tests with real implementations) is the more natural fit.

Alternatives¶

Alternative 1: Hybrid Approach¶

Description: Keep most tests in b_integration/ with mocks, add a few smoke tests in c_system/ without mocks.

Example: - b_integration/test_dashboard_app_create.py - 10 tests with App.create() mocked - c_system/test_dashboard_app_create_smoke.py - 2-3 tests using real App.create()

Pros: - Fast feedback for common cases - Real verification for critical paths - Best of both worlds

Cons: - Duplicated test logic - Maintenance burden (two test suites) - Unclear which approach to use for new tests

Alternative 2: Refactor Domain Layer First¶

Description: Before deciding, refactor App class to separate concerns:

class App:
    def create(self, file_system: FileSystemInterface):
        # Inject file system dependency

Then both options become easier: - Integration tests: Inject mock file system - System tests: Inject real file system

Pros: - Better architecture (dependency injection) - Easier testing at all levels - Clearer separation of concerns

Cons: - Significant refactoring required before implementing tests - Not addressing the immediate question - May not be worth the effort for this use case

Alternative 3: Use E2E Tests Only¶

Description: Skip both integration and system tests, rely on E2E tests in Docker.

Pros: - Maximum confidence (full production-like environment) - No mocking or configuration complexity

Cons: - Very slow feedback compared to in-process tests - Poor developer experience - Violates testing pyramid principles

Rejected: E2E tests are too slow for rapid development.

Prior Art¶

Django Testing Practices¶

Django's test framework uses: - Unit tests: Pure Python logic - TestCase with database: Similar to our integration tests - LiveServerTestCase: Spins up real server (like our system tests) - Selenium tests: Full browser automation (like our E2E)

Django's TestCase uses real database but is still considered "integration" level, not "system" level.

Rails Testing Practices¶

Rails uses: - Model tests: Pure model logic - Controller tests: Request/response with database - Integration tests: Multi-controller workflows - System tests: Full browser automation

Rails "controller tests" mock views but use real database - similar to Option 1.

Fast API Testing Practices¶

FastAPI documentation recommends: - Unit tests: Individual route handlers with mocked dependencies - Integration tests: TestClient with real database - No official "system" layer: Relies on E2E for full stack

FastAPI's approach aligns with Option 1 (integration tests with some mocking).

Unresolved Questions¶

Where is the boundary? What makes a dependency "external" vs "internal to the subsystem"?
What about other UI features? If we choose Option 1 for app creation, does the same apply to:
App deletion UI
App settings UI
Deploy/redeploy UI
Service management UI
Configuration Management: Should we refactor hop3.config to make testing easier, or accept that config patching is complex?
Mock Drift: How do we prevent mocks from diverging from real implementations? Should we have tests that verify mocks match real behavior?
Performance Trade-off: Is the slowdown from real I/O acceptable, or should we prioritize fastest possible feedback?

Future Work¶

Potential Config Refactoring¶

# Current: Module-level constants
HOP3_ROOT = Path(os.environ.get("HOP3_ROOT", "/home/hop3"))
APP_ROOT = HOP3_ROOT / "apps"

# Future: Lazy evaluation or dependency injection
class Config:
    @property
    def HOP3_ROOT(self) -> Path:
        return Path(os.environ.get("HOP3_ROOT", "/home/hop3"))

    @property
    def APP_ROOT(self) -> Path:
        return self.HOP3_ROOT / "apps"

# Easier to mock in tests
config = Config()

Test Infrastructure Improvements¶

Shared Fixtures: Create reusable fixtures for common test setup
Factory Pattern: Use factories for creating test data (apps, users, etc.)
Custom Assertions: Add domain-specific assertions for app state verification
Test Utilities: Helper functions for common test operations

Documentation Updates¶

Add flowchart showing when to use each test layer
Provide decision tree for test placement
Document common patterns for UI testing
Add examples of all four test layers for similar features

ADR 020: Pluggable Architecture - Discusses separation of concerns
ADR 024: Backup/Restore System - Another feature that needs testing classification
Testing Strategy (docs/src/dev/testing-strategy.md) - Current guidelines

References¶

Hop3 Testing Strategy: docs/src/dev/testing-strategy.md
Martin Fowler's Testing Pyramid: https://martinfowler.com/articles/practical-test-pyramid.html
Google Testing Blog: https://testing.googleblog.com/
Django Testing Documentation: https://docs.djangoproject.com/en/stable/topics/testing/
FastAPI Testing: https://fastapi.tiangolo.com/tutorial/testing/

Notes¶

Recommendation from ADR Author¶

I lean toward Option 2 (System Tests): App.create() is core business logic rather than mere I/O, the slowdown is minimal, there is no mock to maintain, real implementations catch real bugs, and the mental model ("system tests = full stack") is simpler. Option 1 is acceptable only if the team prioritizes the fastest possible feedback, commits to maintaining the mock carefully, and adds system-level smoke tests separately.

Resulting Layout (Option 2)¶

packages/hop3-server/tests/
├── a_unit/
│   └── test_app_model.py              ← Pure validation logic
├── b_integration/
│   └── test_dashboard_forms.py        ← Only form validation
├── c_system/
│   └── test_dashboard_app_create.py   ← Full stack tests
└── d_e2e/
    └── test_app_lifecycle.py          ← Complete workflows

Removing the App.create() mock simplifies the fixture rather than complicating it.

Superseded by: ADR 043: Unified Testing Architecture
Related ADRs: ADR 020: Pluggable Architecture for Core Deployment Workflow, ADR 024: Backup and Restore System

ADR 026: Dashboard UI Test Classification¶

Superseding Context¶

Introduction¶

Summary¶

Context and Goals¶

Context¶

Goals¶

Tenets¶

Decision¶

Option 1: Integration Tests with Mocks¶

Option 2: Move to System Tests (Full Integration)¶

Detailed Design¶

Option 1 Implementation¶

Option 2 Implementation¶

Examples and Interactions¶

Example Test: App Creation Success¶

Example Test Flow Comparison¶

Consequences¶

Option 1: Integration Tests with Mocks¶

Benefits¶

Drawbacks¶

Option 2: System Tests without Mocks¶

Benefits¶

Drawbacks¶

Lessons Learned¶

From the Mocked Arrangement¶

From Testing Strategy Document¶

From Test Development Process¶

Alternatives¶

Alternative 1: Hybrid Approach¶

Alternative 2: Refactor Domain Layer First¶

Alternative 3: Use E2E Tests Only¶

Prior Art¶

Django Testing Practices¶

Rails Testing Practices¶

Fast API Testing Practices¶

Unresolved Questions¶

Future Work¶

Potential Config Refactoring¶

Test Infrastructure Improvements¶

Documentation Updates¶

Related¶

References¶

Notes¶

Recommendation from ADR Author¶

Resulting Layout (Option 2)¶