Site Reliability Engineer, Cloud Operations Job Description Template
Our company is looking for a Site Reliability Engineer, Cloud Operations to join our team.
Responsibilities:
- Running and maintaining our production infrastructure hosted on AWS;
- Analysis of complex system behavior, performance and application issues;
- Development of monitoring solutions and analysis across multiple datacenters;
- Develop, maintain and administration of cutting-edge infrastructure deployment tools;
- Capacity analysis and planning, traffic routing, and security policies for Ping’s market leading Single Sign-On SaaS applications;
- Linux systems administration, configuration, troubleshooting and automation;
- Administration of virtualized platforms on various cloud providers (public and private);
- This is a 24/7 on-call position with a rotation schedule.
Requirements:
- Experience using Git in a team environment (merge requests, branching, push, and pulls);
- Experience in a high-volume or critical production service environment;
- 2+ years Amazon Web Services (AWS);
- Experience with Apache, Tomcat, Cassandra, Kafka, and MySQL;
- Strong Jenkins background and experience with Artifactory and build pipelines;
- 5+ years’ experience with Linux/UNIX systems administration;
- Proven technical troubleshooting and performance tuning experience;
- IP networking, including familiarity with the functionality, operating, and failure modes of networks;
- Strong understanding of security design principles;
- Solid experience with server configuration via Puppet/Chef/Salt;
- Experience with Docker and container orchestration (Kubernetes) preferred;
- Solid scripting skills (Python/Ruby/Bash/Go/etc.).