Role Description
Responsibilities
- Define and evolve Dropbox’s company-wide technical reliability strategy to support the changing engineering environment created by AI-assisted and agentic software development.
- Set multi-year reliability goals, standards, and roadmaps across observability, debugging, incident management, service health, and operational readiness.
- Lead cross-team initiatives that reduce reliability risk as software delivery velocity, pull request volume, service complexity, and incident volume increase.
- Partner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems at company scale.
- Identify emerging reliability risks introduced by AI-enabled development workflows and design scalable systems, processes, and guardrails to mitigate them.
- Provide technical leadership and mentorship to engineers across teams, raising engineering quality, reliability judgment, and operational excellence.
- Drive clear communication and alignment with senior stakeholders on reliability priorities, tradeoffs, risks, and execution progress.
Requirements
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent technical experience.
- 12+ years of experience in software engineering, site reliability engineering, infrastructure engineering, or related technical roles.
- Proven ability to define and deliver multi-year, multi-team reliability, infrastructure, or platform strategies with measurable business and customer impact.
- Deep experience with distributed systems, production operations, observability, incident response, SLOs/SLAs, debugging, and reliability risk management.
- Demonstrated ability to diagnose complex technical problems, debug production systems, automate operational workflows, and design resilient software components.
- Experience influencing engineering roadmaps across multiple teams and making technical decisions that optimize for the broader engineering organization.
- Strong communication and collaboration skills, with the ability to align cross-functional stakeholders through ambiguity and drive execution across teams.
Preferred Qualifications
- Experience adapting reliability strategies, developer tooling, or operational processes for AI-assisted software development workflows.
- Experience building or scaling observability, debugging, incident management, or developer productivity platforms for large engineering organizations.
- Experience leading reliability improvements in environments with high deployment velocity, complex service dependencies, and large-scale production systems.
- Track record of mentoring senior engineers, setting technical standards, and spreading reliability best practices through documentation, reviews, talks, or architecture guidance.
- Familiarity with AI-enabled tooling, agentic development workflows, or operational risks introduced by rapid automation in the software development lifecycle.
Compensation