Site Reliability Engineer, Cloud Operations

Site Reliability Engineer, Cloud Operations Job Description Template

Our company is looking for a Site Reliability Engineer, Cloud Operations to join our team.

Responsibilities:

  • Running and maintaining our production infrastructure hosted on AWS;
  • Analysis of complex system behavior, performance and application issues;
  • Development of monitoring solutions and analysis across multiple datacenters;
  • Develop, maintain and administration of cutting-edge infrastructure deployment tools;
  • Capacity analysis and planning, traffic routing, and security policies for Ping’s market leading Single Sign-On SaaS applications;
  • Linux systems administration, configuration, troubleshooting and automation;
  • Administration of virtualized platforms on various cloud providers (public and private);
  • This is a 24/7 on-call position with a rotation schedule.

Requirements:

  • Experience using Git in a team environment (merge requests, branching, push, and pulls);
  • Experience in a high-volume or critical production service environment;
  • 2+ years Amazon Web Services (AWS);
  • Experience with Apache, Tomcat, Cassandra, Kafka, and MySQL;
  • Strong Jenkins background and experience with Artifactory and build pipelines;
  • 5+ years’ experience with Linux/UNIX systems administration;
  • Proven technical troubleshooting and performance tuning experience;
  • IP networking, including familiarity with the functionality, operating, and failure modes of networks;
  • Strong understanding of security design principles;
  • Solid experience with server configuration via Puppet/Chef/Salt;
  • Experience with Docker and container orchestration (Kubernetes) preferred;
  • Solid scripting skills (Python/Ruby/Bash/Go/etc.).

What job descriptions are similar to Site Reliability Engineer, Cloud Operations?