← Back 05

Architectural Blueprint: Programmatic CI/CD via Dagger.io

May 2026 · Part 5 of a 5-part series on migrating CI/CD from GitHub to Codeberg

Root Cause Analysis

The 21 migration gotchas documented in Part 3 share a single structural dependency: the limitations of declarative YAML for complex pipeline orchestration.

Attempting to implement logic-heavy workflows without native support for workflow_run or workflow_call leads to severe configuration bloat. The current production ci.yml spans 524 lines across 8 jobs, with highly duplicated Buildx cache declarations, fragile shell escaping rules, and silent execution failures (e.g., hyphenated job-ID parsing bugs).

The technical debt cannot be resolved by further YAML refactoring. The pipeline logic must be migrated to an imperative execution model.

Technical Overview: Dagger.io SDK

Dagger.io replaces YAML-based steps with an application-level SDK. Pipelines are executed as code (Python, Go, or TypeScript), converting the runner’s configuration file into a simple bootstrap script that calls the underlying program:

dagger run python ci/main.py

Comparison Matrix

Capability	Forgejo YAML (Current)	Dagger Python (Proposed)
Pipeline Definitions	8 YAML jobs via standard DAG configuration	Single bootstrap job executing a Python script
Step Execution	Inline POSIX shell scripts inside YAML strings	Structured Python functions using the Dagger SDK
Caching Layer	Forgejo custom cache actions (`uv`/`bun`)	Native Dagger engine `cache_volume()` API
Concurrency	Declarative `needs` dependency mapping	Asynchronous scheduling via `anyio.create_task_group()`
Tool Availability	Step-by-step runner environment setup	Execution inside pre-configured Dagger containers
Secret Injection	Heredoc interpolation via environment blocks	`with_new_file()` SDK method (bypasses the shell)
Validation Loop	Remote runtime observation only	Local compilation, debugging, and unit testing

Pipeline Topology

The refactored runtime logic consists of approximately 370 lines of asynchronous Python execution split across three sequential execution blocks.

Phase 1 (Concurrent execution): Launches lint_backend, test_backend, and test_frontend simultaneously within an anyio task group.
Phase 2 (Concurrent execution): Initiates parallel Docker builds for the backend and staging frontend service layers, pushing the final build artifacts directly to Google Artifact Registry (GAR).
Phase 3 (Sequential execution): Handles deployment steps. Order of operations is strictly enforced to ensure Kubernetes secrets are applied before running database migrations, Helm upgrades, and final HTTP health checks.

Key Engineering Decisions

Runtime Engine Language Selection

The Python SDK was selected to align with the core backend services (Python 3.13). This allows the engineering team to reuse existing asynchronous patterns (anyio) already present in our application components (such as pydantic-ai) without adding a new language runtime to the tech stack.

Secret Isolation via File Injection

To completely eliminate the double-base64 encoding issue exposed during kubectl create secret calls, the pipeline completely removes shell-based interpolation. Codeberg repository secrets are loaded into the initial execution context as environment variables (os.environ), and injected directly into target deployment containers as filesystem inputs via Dagger’s with_new_file() method.

Shared DinD Daemon Socket Architecture

By default, the Dagger CLI provisions an independent container running Buildkit to manage execution steps. To prevent nested virtualization performance degradation (Docker-in-Docker-in-Docker), the execution environment is explicitly configured to attach to the runner’s underlying daemon socket via environment configuration:

DOCKER_HOST=unix:///var/run/docker.sock

This binds Dagger directly to the existing GKE node container daemon, reducing resource consumption and containing the network attack surface.

Deterministic Build Verification

Infrastructure code like configuration compilers, environment file generation scripts, and asset argument maps are written as isolated Python utilities. They are verified locally using standard unit tests, while integration routines requiring an active Dagger engine are partitioned under @pytest.mark.e2e blocks to optimize local development feedback loops.

Implementation & Migration Strategy

The transition plan uses a 5-phase canary model designed to keep rollback costs near zero. If a failure occurs during any validation step, reverting to the legacy pipeline requires a single Git command: git checkout HEAD~1 -- ci.yml.

[Phase A: Implement] ──> [Phase B: Validate] ──> [Phase C: Coexistence] ──> [Phase D: Cutover] ──> [Phase E: Cleanup]

Phase A: Core Implementation (Estimated: 2–3 Days): Write the ci/main.py pipeline logic and accompanying unit tests. The live production ci.yml is completely unchanged. Risk Profile: None.
Phase B: Runtime Validation (Estimated: 2–3 Dispatches): Manually trigger a standalone test workflow (ci-dagger.yml) on the active GKE spot pool. Verify mount permissions for the docker.sock interface, check secret resolution formatting, and run isolated test deployments. Risk Profile: Minimal.
Phase C: Parallel Coexistence (Estimated: 1 Week): Configure GitHub pushes to trigger both CI systems concurrently. The Dagger pipeline will build and push to GAR but skip final target cluster delivery. Execution duration and asset consistency will be monitored over 7 calendar days. Risk Profile: Moderate (resource contention).
Phase D: Cutover (Estimated: 1 Dispatch): Overwrite the main ci.yml file with the final production Dagger bootstrap sequence. Move the legacy YAML file to ci.yml.old for disaster recovery. Risk Profile: Moderate.
Phase E: Resource Cleanup (Estimated: 2 Weeks Post-Cutover): Confirm deployment stability over a two-week window, then remove ci-dagger.yml and ci.yml.old from the repository tree. Keep manual deploy fallbacks as emergency recovery paths. Risk Profile: None.

Infrastructure Configuration Management

    [Legacy Context]                            [Target Context]
   524 Lines of YAML                          ~100 Lines of Bootstrap YAML
   (Direct Orchestration)                        (Spins up Dagger CLI)
                                                         │
                                                         ▼
                                                370 Lines of Python
                                               (Programmatic Pipeline)

The data below summarizes the structural components before and after the architecture shift:

Pipeline Orchestrator: Swapping 524 lines of declarative YAML for 370 lines of structured Python.
YAML Wrapper Layer: A brand new, minimal 100-line entrypoint configuration designed to bootstrap the execution engine.
GKE Nodes, Secret Vaults, Helm Structure, & K8s Manifest Specs: Completely unchanged.

Risk Assessment & Mitigations

1. Dagger Socket Connection Refusal

Likelihood: Moderate.
Mitigation: Phase B includes explicit verification criteria for the DOCKER_HOST Unix socket path before altering production code paths.

2. Cache Performance Regression vs. Native Buildx

Likelihood: Moderate.
Mitigation: Build speed benchmarks will be compared during Phase C parallel execution loops prior to modifying primary production environments.

3. Pipeline Runtime Transparency Breakdown

Likelihood: Low.
Mitigation: Ensure script stdout formatting uses structured logging protocols; leverage the dagger run --debug flag for detailed API request traces.

4. SDK Version Drift

Likelihood: Low.
Mitigation: Lock the target library to an immutable version block (dagger-io==0.12.7) within the local python requirements.txt environment file.

Operational Impact

This re-architecture trades standard declarative configuration files for an execution model that values unit-testability, predictable secret parsing, and maintainable exception handling. Programmatic pipeline code can be refactored, shared, and version-controlled with the same degree of rigor applied to standard enterprise software assets.