← Back 03

Challenges & Gotchas

May 2026 · 5-part series: migrating CI/CD from GitHub to Codeberg

Migrating from GitHub Actions to Forgejo Actions seems like a drop-in replacement — the YAML syntax is nearly identical. But Forgejo's act runner has enough behavioral differences that you'll hit surprises. Here are all the gotchas we discovered, including the ones that only surfaced days or weeks into production.

Expected Gotchas (the ones you'll plan for)

1. No github.event_name

Forgejo's act runner doesn't reliably populate github.event_name. Use inputs.deploy_to conditions instead.

2. No workflow_run

Multi-workflow triggers are unsupported. Merge everything into a single ci.yml with needs dependencies.

3. No workflow_call (Reusable Workflows)

Forgejo doesn't support workflow_call at all. Use composite actions via action.yml files, or inline the steps.

4. No Environment-Scoped Secrets

Codeberg secrets are organization-level — one flat namespace no environments. Prefix with STAGING_ or PRODUCTION_. This doubles your secret count and adds risk of wrong-prefix deployment.

5. Secret Double-Encoding (The Silent Killer)

kubectl create secret --from-literal with ${{ secrets.X }} double-base64-encodes values with special characters. PostgreSQL URLs become gibberish. Use --from-env-file with a single-quoted heredoc instead.

6. GKE Spot Preemption

Spot nodes can be preempted at any time, killing active runner pods and failing jobs. If spot capacity is unavailable entirely, CI is blocked. Keep an on-demand fallback pool.

Production Surprises (the ones you discover at 2 AM)

7. Hyphenated Job-ID Parsing Bug

The Forgejo act runner fails to parse runs-on for job IDs containing hyphens: 'runs-on' key not defined in ci/test-backend. Rename all job IDs from kebab-case to snake_case across every job, needs reference, and concurrency group.

8. Docker 29.5.1 Regression (Path Escape)

Docker 29.5.1 introduced a security fix that uses Go 1.24's os.Root to restrict path operations. Forgejo's runner copies files into /var/run/act, but /var/run is a symlink to /run — Docker's stricter validation rejects this as "path escapes from parent." Fix: pin DinD to docker:29.5.0-dind (which has known CVEs) until 29.5.2 ships.

9. tea CLI "Success Error"

The official Codeberg CLI reports unexpected end of JSON input on every successful dispatch because Forgejo returns 204 No Content. You learn to ignore the error and check with tea actions runs list.

10. Missing needs → Empty Image Tags

Any job that references needs.build_backend.outputs.image_sha must explicitly list build_backend under needs. Failing to do so resolves outputs to empty strings, causing InvalidImageName: kikitoru-backend:-staging and deployment crashes.

11. Helm Release Lock Contention

Two workflow triggers firing the deploy concurrently produce UPGRADE FAILED: another operation is in progress. Always include a "Recover stuck Helm releases" step that auto-rolls back releases in pending-upgrade or failed states before attempting deploy.

12. SHA Race Condition (Build vs Deploy)

If your deploy checks out main at HEAD (now a newer commit) but images were built at the previous SHA, tags won't match. Pods get ImagePullBackOff. Deploy must use the same SHA as build.

13. envsubst Empty Values

envsubst silently produces an empty string if the env var isn't exported, causing production crashes with BackoffLimitExceeded. Read from secretRef instead of injecting via templates.

14. uv run Not in Production Image

The uv binary is builder-stage-only in Docker multi-stage builds. Use /app/.venv/bin/alembic upgrade head inside K8s job containers — never uv run alembic.

15. *.png Gitignore Breaks Docker Builds

A root .gitignore rule like *.png silently excludes icon PNGs from git. Files exist locally (Vite copies from public/ to dist/) but are never in the Docker build context. Only way to catch this: curl -sI <staging-url>/icon-192.png and verify content-type: image/png.

16. Alembic Autogenerate Duplicate Tables

Running alembic revision --autogenerate on an out-of-sync local database generates op.create_table for tables that already exist in production. Causes DuplicateTable crash. Always run alembic upgrade head locally before autogenerating.

17. Ruff Fails on Migration Files

Commenting out code in a migration file makes imports unused. Our lint step fails the build on unused imports. Ruff config must account for alembic migration files.

18. External Base Image Dependency

All CI jobs depend on catthehacker/ubuntu:act-22.04 from Docker Hub. If Docker Hub rate-limits you or the image is pulled, every job fails. Consider mirroring to your own registry.

Security Concerns

19. DinD Runs Privileged

The DinD container requires privileged: true — full kernel capability access. Mitigated by running on an isolated CI spot pool with node taint so no other workloads co-reside.

20. GCP Auth Downgrade from Workload Identity

GitHub Actions used Workload Identity Federation (keyless). Forgejo on GKE uses the node's attached service account via metadata server curl — still better than a raw JSON key, but a step down from WIF.

21. Pinned Vulnerable DinD Image

docker:29.5.0-dind has known CVEs that 29.5.1 patched. Pinned to 29.5.0 because 29.5.1 broke Forgejo compatibility. Acceptable only because the CI pool is isolated and processes no untrusted code.

Summary

Six gotchas we planned for. Fifteen we discovered the hard way. The migration took not one weekend but two full weeks of debugging. Budget accordingly.

GotchaImpactFix
No event_nameDeploy conditions brokenUse inputs.deploy_to
No workflow_runMulti-workflow triggers brokenMerge into single ci.yml
No workflow_callNo reusable workflowsComposite actions
No env secretsEnvironment isolation missingSTAGING_/PRODUCTION_ prefix
Secret encodingGarbled secrets, deploy failures--from-env-file + heredoc
Spot preemptionLost CI jobsOn-demand fallback pool
Hyphenated job IDsAll CI brokensnake_case job names
Docker 29.5.1 regressionAll CI brokenPin to 29.5.0-dind
tea CLI errorConfusing false errorsLearn to ignore it
Missing needs depsInvalidImageName crashesDeclare all needs
Helm lock contentionDeploy hangsAuto-recover stuck releases
SHA race conditionImagePullBackOffSame SHA for build+deploy
envsubst empty valuesMigration crashUse secretRef
uv run not in prodMigration crash/app/.venv/bin/alembic
*.png gitignoreMissing icons in prodCheck content-type
Duplicate tablesProduction crashalembic upgrade head first
Ruff on migrationsCI false positiveConfigure ruff exceptions
External base imageSupply chain riskMirror to own registry
DinD privilegedSecurity riskIsolated CI pool
GCP auth downgradeSecurity downgradeDedicated CI SA, rotate key
Pinned vulnerable DinDKnown CVEsIsolated pool, temp pin

Bottom line: The move was worth it — same speed, 10x cheaper, more reliable by design. But Forgejo's act runner is not a drop-in replacement. Test everything. Budget two weeks, not a weekend. And always test secret encoding with a postgresql:// URL on day one.

← Architecture & Topology Next: Results & Costs →