Site Reliability Engineering Manager (Remote)

  • Hexagon PPM
  • Remote * (Atlanta, GA, USA)
  • Jan 15, 2022
Engineering Management Telecommuting

Job Description

Overview:

Hexagon is a global leader in sensor, software and autonomous solutions. We are putting data to work to boost efficiency, productivity, and quality across industrial, manufacturing, infrastructure, safety, and mobility applications. Our technologies are shaping urban and production ecosystems to become increasingly connected and autonomous ensuring a scalable, sustainable future. Hexagon (Nasdaq Stockholm: HEXA B) has approximately 20,000 employees in 50 countries and net sales of approximately 3.8bn EUR. Learn more at hexagon.com and follow us .

Responsibilities:

Hexagons Information Technology team is looking for an experienced Site Reliability Engineering Manager who will be an integral member of the team. This is a great opportunity to work in a fun and collaborative environment and to work with a great team of technology professionals across the globe. Were looking for someone who will help lead the ownership of the global scale of Hexagons web presence including the flagship hexagon.com which drives ~9 million visits annually. The ideal candidate should have strong experience and expertise in running best-in-class and modern web infrastructure, operations, and observability.

Team Management:

  • Direct management of full-time, contract, and agency teams to support resiliency initiatives
  • Support and mentorship of team members in career and skill growth to push best-of-breed reliability technologies

Technical Leadership:

  • Maintaining site infrastructure and networking, ensuring proper service level indicators and service level objectives
  • Definition and execution of operational playbooks for disaster recovery, data restoration, and ongoing maintenance
  • Collaboration with development teams to minimize risk through automation and process execution
  • Management of a supporting global team to enable 24/7 coverage for infrastructure health
  • Evangelize reliability and security as a key factor for business success
  • Provide thought leadership and best practices across the organization for site reliability

#LI-RR

Qualifications:
  • 6 8 years of proven site reliability experience preferably in a global matrixed technology organization
  • Expertise in Microsoft Azure, Kubernetes (AKS), observability tooling (Prometheus/Grafana/Logstash), Azure DevOps/Pipelines for deployments
  • Strong knowledge of the architecture and deployment of the Sitecore Experience Platform
  • Working knowledge of Wrike (planning) and Microsoft Teams (collaboration)
  • Experience working within the software development life cycle and management of projects for reliability solutions

This role has the ability to sit fully remote within the United States (CST or EST time zone preferred).