About the role
<p>We’re a global team of over 400 people, working together to push the boundaries of open-source technology and multi-cloud solutions. Our vision is to help developers, builders, and creators bring their ideas to life with speed and simplicity, by providing a cloud data platform that makes open-source databases, search, streaming, and application infrastructure easily accessible to everyone.&nbsp;</p> <h3><strong>The Role:</strong></h3> <p>We are seeking a Director of Site Reliability Engineering to lead a global organization responsible for the reliability and operational excellence of the Aiven platform globally. You will lead a high-performing SRE team, setting the vision and strategy to ensure resilient, scalable, and highly automated systems across our 24/7/365 operations.</p> <p>Your team will proactively manage platform health, lead incident response and cross-functional coordination, and drive continuous improvement in reliability and performance. As a senior leader, you will partner closely with engineering, product, and support teams worldwide, influence system architecture, and invest in tooling and automation to reduce toil and enhance production reliability.</p> <p>This role combines strategic leadership, customer centricity, and deep operational accountability, with a focus on delivering reliable services at global scale while developing strong technical leaders within your organization.</p> <h3><strong>What You'll Do:</strong></h3> <ul> <li>Define and drive global SRE operating strategy in partnership with regional SRE leaders across EMEA, AMER and APAC, ensuring alignment on reliability goals, operating models, and execution across a 24/7/365 follow-the-sun organization.</li> <li>Build and lead a multi-regional SRE organization through managers, developing leadership capability, mentoring team, and ensuring consistent performance, culture, and delivery across geographies.</li> <li>Set the vision and roadmap for reliability engineering, enabling teams to deliver high-impact tools, automation, and process initiatives that improve platform resilience, scalability, and efficiency.</li> <li>Own global incident management strategy and operating model, including on-call design, coverage, and escalation frameworks, ensuring seamless coordination and high availability across regions.</li> <li>Establish a metrics-driven operating cadence, defining KPIs/SLIs/SLOs/Error Budget, driving data-informed prioritization, and embedding operational rigor and continuous improvement across the SRE organization.</li> </ul> <h3><strong>What We're Looking For:</strong></h3> <ul> <li>Proven experience leading and scaling global SRE or infrastructure organizations through managers, ideally across multiple regions and time zones.</li> <