CA

Senior AI Engineer - Harness Engineering (Kimchi)

Cast AI
Bulgaria; Croatia; Estonia; Greece; Hungary; Latvia; Lithuania; Poland; Romania; Slovakia; Slovenia;, UKfull_timePosted 11 Jun 2026

About the role

<h4>Why Kimchi?</h4> <p>Kimchi is the AI platform inside CAST AI. We started by helping companies run LLMs on their own Kubernetes clusters - now we're building the execution layer where agents do real work.</p> <p><strong>Our Infrastructure today</strong>: multi-model inference (MiniMax, Kimi, GLM-5, Nemotron, DeepSeek) with intelligent routing, an OpenAI-compatible API, and deployment flexibility from our GPUs to your VPC. The inference layer is the foundation. What we're hiring for sits on top of it: coding agents, agent runtimes, orchestration systems, and the reliability engineering that makes them actually finish things.</p> <p><strong>Tech Stack: </strong>TypeScript, Go, Kubernetes, AWS/GCP/Azure, MCP, Prometheus/Grafana/Loki, GitLab CI, ArgoCD.</p> <p><span style="text-decoration: underline;">Why harness engineering matters here<br></span>OpenAI and Anthropic ship models. They also ship one harness each - the scaffolding that turns a raw model into something that can plan, execute, recover, and complete work. We ship a different kind of harness: one built for cost-conscious, long-horizon autonomy, running on inference infrastructure we control end-to-end.<br>A decent model with a great harness beats a great model with a bad harness. We've watched this play out. The gap between what today's models can do and what you see them doing is largely a harness gap - and that gap is where we operate.</p> <p><span style="text-decoration: underline;">What you'll build<br></span>The ratchet.<br>Every time our agent makes a mistake, we engineer a solution so it never makes that mistake again. That means hooks that enforce constraints the model "knows" but forgets: pre-commit lint checks, permission gates, context compaction before the window fills. Success is silent, failures are verbose.</p> <p>Long-horizon execution.<br>Our harness is built around spec-driven autonomy: meta-prompting, fresh context per task, worktree-per-slice git strategy, automatic replanning, crash recovery, stuck detection. We're implementing Ralph loops - when the model tries to exit, we intercept and reinject the goal into a fresh context. The agent reads state from disk and continues. Multi-session, multi-day work, without context rot.</p> <p>Planner/executor splits.<br>Planning with a reasoning model, executing with a fast one, evaluating with a third. Separating generation from evaluation beats self-verification because agents reliably skew positive when grading their own work.</p> <p>The harness surface.<br>CLI, TUI, MCP integration, sandboxed execution, telemetry. Our AGENTS.md is short - every line traces to a specific thing that went wrong. TypeScript on the surface, Go where it matters.</p> <p>Memory and context.<

Apply for this role

Generate a tailored application kit with a matched cover letter, interview prep, and CV highlights — in under 60 seconds.

Generate Application Kit

Free account required — sign up in 30s

Company

Cast AI

View all open roles →