About the role
<h2><strong>This is Adyen</strong></h2> <p><span style="font-weight: 400;">Adyen provides payments, data, and financial products in a single solution for customers like Meta, Uber, H&amp;M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition.&nbsp;</span></p> <p><span style="font-weight: 400;">For our teams, we create an environment with opportunities for our people to succeed, backed by the culture and support to ensure they are enabled to truly own their careers. We are motivated individuals who tackle unique technical challenges at scale and solve them as a team. Together, we deliver innovative and ethical solutions that help businesses achieve their ambitions faster.</span></p> <h2>AI Research</h2> <p>Adyen is building a world-class AI team to redefine what intelligent systems can do in financial technology. As a <strong>Senior AI Research Engineer</strong>, you will take on some of the most technically demanding work in applied AI: designing agents that reason over complex, multi-step tasks; building the evaluation infrastructure that makes those systems trustworthy in production; and shaping how humans and AI collaborate at scale within a global payments company.</p> <p>This is not a narrow research role. You will take full ownership of your work, from early research through deployed production systems, influence the team's technical direction, and act as a force multiplier for the broader AI organization — including contributing to custom model development for structured financial data, and working toward our longer-term ambition of defining how humans and AI collaborate at scale across the company.</p> <h2>What You'll Do</h2> <ul> <li><strong>Design and Deploy AI Agents for Complex Tasks: </strong>Lead the research, design, and deployment of AI agents built for long-horizon, multi-step tasks in real-world financial contexts — including data analysis pipelines, operational workflows, and integrity risk scenarios. Architect robust agentic systems covering multi-agent orchestration, tool dispatch, context and memory management, and error recovery for long-running workflows. Design human-in-the-loop mechanisms that define when agents act autonomously, when they surface uncertainty, and when they escalate or defer to humans.</li> <li><strong>Own Evaluation and Benchmarking: </strong>Define and lead the evaluation strategy for the agentic systems and LLMs your team builds and deploys. Design internal benchmarks grounded in real domain complexity — probing for genuine capabilities, edge cases, and failure modes that standard metrics miss. Build reusable evaluation infrastructure that is embedded in the development process, not bolted on after the fact.</li>