Senior AI Engineer - Harness Engineering (Kimchi)

Cast AI

Bulgaria; Croatia; Estonia; Greece; Hungary; Latvia; Lithuania; Poland; Romania; Slovakia; Slovenia;, UKfull_timePosted 11 Jun 2026

About the role

<h4>Why Kimchi?</h4> Kimchi is the AI platform inside CAST AI. We started by helping companies run LLMs on their own Kubernetes clusters - now we're building the execution layer where agents do real work. Our Infrastructure today: multi-model inference (MiniMax, Kimi, GLM-5, Nemotron, DeepSeek) with intelligent routing, an OpenAI-compatible API, and deployment flexibility from our GPUs to your VPC. The inference layer is the foundation. What we're hiring for sits on top of it: coding agents, agent runtimes, orchestration systems, and the reliability engineering that makes them actually finish things. Tech Stack: TypeScript, Go, Kubernetes, AWS/GCP/Azure, MCP, Prometheus/Grafana/Loki, GitLab CI, ArgoCD. Why harness engineering matters here OpenAI and Anthropic ship models. They also ship one harness each - the scaffolding that turns a raw model into something that can plan, execute, recover, and complete work. We ship a different kind of harness: one built for cost-conscious, long-horizon autonomy, running on inference infrastructure we control end-to-end. A decent model with a great harness beats a great model with a bad harness. We've watched this play out. The gap between what today's models can do and what you see them doing is largely a harness gap - and that gap is where we operate. What you'll build The ratchet. Every time our agent makes a mistake, we engineer a solution so it never makes that mistake again. That means hooks that enforce constraints the model "knows" but forgets: pre-commit lint checks, permission gates, context compaction before the window fills. Success is silent, failures are verbose. Long-horizon execution. Our harness is built around spec-driven autonomy: meta-prompting, fresh context per task, worktree-per-slice git strategy, automatic replanning, crash recovery, stuck detection. We're implementing Ralph loops - when the model tries to exit, we intercept and reinject the goal into a fresh context. The agent reads state from disk and continues. Multi-session, multi-day work, without context rot. Planner/executor splits. Planning with a reasoning model, executing with a fast one, evaluating with a third. Separating generation from evaluation beats self-verification because agents reliably skew positive when grading their own work. The harness surface. CLI, TUI, MCP integration, sandboxed execution, telemetry. Our AGENTS.md is short - every line traces to a specific thing that went wrong. TypeScript on the surface, Go where it matters. Memory and context.<

Apply for this role

Generate a tailored application kit with a matched cover letter, interview prep, and CV highlights — in under 60 seconds.

Generate Application Kit

Free account required — sign up in 30s

Company

Cast AI

View all open roles →

Similar roles

Senior Broker Technician

G J Associates Ltd

EC3M1AA

£60,000/yr

Senior Risk Engineer - Property

Avencia Consulting

London

Senior Product Manager

Harnham - Data & Analytics Recruitment

London

£90,000/yr