← Back 05

The Plan: Programmatic CI/CD via Dagger.io

May 2026 · 5-part series: migrating CI/CD from GitHub to Codeberg

Why Another Migration?

After 21 gotchas on Forgejo's act runner (see part III), we hit a wall. The Forgejo YAML is limited: no workflow_run, no workflow_call, double-encoding secret bugs, hyphenated job-ID parsing failures, Helm lock contention. Our current ci.yml is 524 lines of YAML with 8 jobs and duplicated buildx/cache logic.

Every CI bug we hit traces back to the same root cause: we're trying to orchestrate a complex pipeline in a declarative YAML format that wasn't designed for what we're doing. The fix isn't more YAML — it's code.

What Dagger Gives Us

Dagger.io is an SDK that lets you write CI/CD pipelines in Python, Go, or TypeScript. Instead of 524 lines of YAML, we write a Python script. The YAML becomes a thin wrapper that installs Dagger and calls dagger run python ci/main.py.

Forgejo YAML (current)Dagger Python (planned)
8 YAML jobs with needs DAG1 job: install deps + run script
Shell scripts for build/push/deployPython functions with Dagger SDK
Forgejo cache action for uv/bunDagger container caching + host mount
Parallel lint+test via YAML needsanyio.create_task_group() in Python
Docker buildx in DinDDagger's built-in container builds
kubectl installed per stepDagger containers with tools baked in
Secret heredocs (double-encoding risk)with_new_file() — no shell interpolation
No testabilityUnit-testable pipeline code

What the Pipeline Looks Like

The core pipeline is ~370 lines of Python with three phases:

PhaseWhat runsParallelism
Phase 1lint_backend + test_backend + test_frontend3 concurrent via anyio
Phase 2build_backend + build_frontend → GAR push2 concurrent via anyio
Phase 3deploy (k8s secrets → migration → helm upgrade → health check)sequential

Each phase is a composable async Python function. Lint and test run in parallel via anyio.create_task_group(). Builds run in the same task group pattern. Deploy is sequential — secrets must exist before helm runs.

Key Design Decisions

Python SDK

Matches our existing backend stack (Python 3.13, anyio already familiar from pydantic-ai). No new language to learn.

Secrets via with_new_file()

No envsubst on secrets — no shell interpolation at all. Secrets flow: Codeberg repo secrets → YAML env block → os.environ → Python dict → with_new_file() inside deploy container. This eliminates the double-encoding bug entirely.

Dagger Engine Reuses DinD Socket

Dagger defaults to provisioning its own Buildkit engine as a Docker container — Docker-in-Docker-in-Docker. Instead, we set DOCKER_HOST=unix:///var/run/docker.sock, telling Dagger to use the existing DinD daemon. No new containers, no security context issues.

Container Caching

Dagger's cache_volume() persists uv and bun caches across pipeline runs. After the first build populates the cache, subsequent builds skip package installation.

Pipeline Is Testable

The secret parsing, env-file building, and vite arg logic are all pure Python functions that can be unit-tested without a live Dagger engine. Integration tests (requiring Dagger) are marked @pytest.mark.e2e and skipped in CI.

Migration Strategy (5 Phases)

PhaseWhat happensRiskDuration
A: ImplementCreate ci/main.py + tests. Old ci.yml untouched.Zero2-3 days
B: ValidateManual dispatch of ci-dagger.yml on GKE runner. Check DinD socket, secret encoding, deploys.Low (old pipeline still primary)2-3 dispatches
C: Side-by-sideBoth run on push. Dagger builds/pushes but skips deploy. Compare results for 1 week.Medium (two pipelines compete)1 week
D: Cutoverci.yml becomes the thin Dagger wrapper. Old YAML saved as ci.yml.old for instant revert.Medium (primary flips)1 dispatch
E: CleanupDelete ci-dagger.yml and ci.yml.old. Keep deploy-staging/production as emergency fallback.NoneAfter 2 weeks stable

The key: old ci.yml never gets deleted until Dagger has been running in production for 2 weeks. If Dagger fails at any point, revert is instant — git checkout HEAD~1 -- ci.yml.

What Changes vs What Stays

ComponentBeforeAfter
Pipeline entry pointci.yml (524 lines YAML)ci/main.py (~370 lines Python)
YAML wrapperci.yml (~100 lines YAML)
GKE runnerSame (Forgejo, ci-spot-pool)Unchanged
SecretsSame (Codeberg org secrets)Unchanged
Helm chartsSameUnchanged
K8s manifestsSameUnchanged
Fallback workflowsdeploy-staging.yml, deploy-production.ymlKept as-is

Risks

RiskLikelihoodMitigation
Dagger engine fails to connect to DinD socketMediumTest DOCKER_HOST=unix:///var/run/docker.sock in Phase B; keep fallback workflows
Build caching worse than buildxMediumBenchmark build times in Phase C before cutover
Pipeline harder to debug than YAMLMediumStructured logging; dagger run --debug flag
SDK version incompatibilityLowPin dagger-io==0.12.7 in requirements.txt

Timeline

Phase A (implementation): 2-3 days. Phase B (validation): 3-5 dispatches. Phase C (side-by-side): 1 week minimum. Phase D (cutover): 1 dispatch + monitoring. Total: ~2 weeks to full cutover + 2 weeks of validation before cleanup.

What We're Actually Fixing

This isn't about Dagger. This is about:

  1. 21 YAML gotchas that all trace back to "we need code, not YAML"
  2. No testability — you can't unit-test a YAML pipeline
  3. Secret handlingwith_new_file() eliminates the entire class of double-encoding bugs
  4. Maintainability — 524 lines of declarative shell-in-YAML is not maintainable. 370 lines of Python with tests is.

The goal isn't to adopt Dagger. It's to stop fighting the runner and start writing pipelines like we write the rest of our code.

← Results & Costs