Engineering Manager, Infrastructure & Operations at Thrive Global
San Francisco, CA, US

Thrive Global is the leading behavior change technology company helping individuals and companies reach peak performance, ultimately ending the stress and burnout epidemic. We’re leading the global conversation about well-being and performance and creating tools and programs that help people go from knowing what to do to actually doing it.

As our Engineering Manager for Infrastructure & Operations, you will lead the engineering team responsible for cloud infrastructure, DevOps, and site reliability at Thrive Global. This role reports to Thrive’s Chief Technology Officer.

About Us
Thrive Global’s mission is to end the stress and burnout epidemic by offering companies and individuals sustainable, science-based solutions to enhance well-being, performance, and purpose, and create a healthier relationship with technology. Recent science has shown that the pervasive belief that burnout is the price we must pay for success is a delusion. We know, instead, that when we prioritize our well-being, our decision-making, creativity, and productivity improve dramatically. Thrive Global is committed to accelerating the culture shift that allows people to reclaim their lives and move from merely surviving to thriving. 
Who We Are Looking For
This role requires an engineering leader who can think strategically and carry company-wide initiatives from definition through implementation. It will be your job to build and run an effective team of developers that’s responsible for the end-to-end construction and operation of the infrastructure for our cloud-based B2B SaaS platform. In addition to providing technical leadership and management, you will be hands-on as an architect and an infrastructure developer. Here are a few examples of the work you’ll be performing:

  • Choosing Thrive’s AWS Organizations architecture and account management strategy
  • Selecting and captaining the implementation of the Thrive platform service mesh
  • Defining key reliability metrics such as SLAs, determining the methods of measurement, and architecting our core infrastructure to meet those SLAs
  • Working with the rest of the product development leadership to define reliability and infrastructure roadmap initiatives, and then directing your team to execute on those initiatives
  • Partnering with our Security and Compliance team to ensure that we have platform security baked into the infrastructure layer

Required Experience 
  • 7+ years of combined experience in DevOps, Infrastructure Engineering, or Site Reliability Engineering, in hands-on and/or leadership roles
  • 4+ years working in cloud-based environments
  • 2+ years building on AWS
  • 2+ years in a management role
  • Must be able to demonstrate programming skills (Python, Ruby, Go, etc)
  • Expert knowledge of automation through Infrastructure as Code, preferably with Terraform
  • Extensive experience automating cloud environments with configuration management platforms (Ansible, Salt, etc)
  • Deep knowledge of networking (specifically cloud networking and technologies such as VPCs)
  • Excellent written and oral communication, with the ability to write and maintain excellent documentation

Desired Skills
  • Extensive experience with Terraform and multi-account AWS architectures
  • Experience working with complex data processing platforms and data lakes
  • Knowledge of application server development, with the ability to dive into the platform codebase to suggest reliability-related solutions
  • Prior experience with Prometheus and Grafana for monitoring
  • ­Being part of a mission­-driven company that’s truly making a difference in the lives of people around the world ­ 
  • Ability to develop within the company and shape our growth strategy
  • A human-centric culture with a range of wellness perks and benefits
  • A competitive compensation package
  • Medical, dental and vision coverage + 401k program with company match
  • Generous paid time-off programs