Our client is looking for a Site Reliability Engineer who is looking to be creative, build and script solutions from top to bottom and become a part of an A+ team. The Site Reliability Engineer is a part of an innovative team, who are on a continuous mission of building bulletproof, scalable, secure private and public cloud environments for our customers and users.
If you think hard is fun, and get bored easily if you aren’ t challenged, this might be the place for you. We want someone who has an insatiable thirst for technology, desire to learn and grow – individually, with the team, and the business. This is a challenging position but would be the perfect fit for someone who wants to contribute, grow or get started in their career. DUTIES & RESPONSIBILITIES
The SRE is responsible for any and all tasks related to the performance, stability, reliability, efficiency, and security to both the sites and the general team operations. Responsibility also extends to how incidents are managed and operated.
Design and develop complete end to end automation environment using configuration/auto-scaling tools.
Define standards for configuration, monitoring, reliability, scalability, performance optimization and capacity planning of new infrastructure focused on 99.9%+ uptime.
Respond to off-hours and weekend emergency alerts, alarms, and requests, in keeping with the team' s on-call rotation schedule.
Document solutions and create diagrams.
Strategize with the teams to develop new technology initiatives with a primary focus on availability, supportability, scalability, security, and performance.
Configure and tune an enterprise monitoring and instrumentation system(s) to efficiently detect existing issues and predict future issues based on trends.
Stay up-to-date with technology. Recurrently advance your technical skill-sets.
Continuously improve via taking justifiable risks, not being afraid to fail.
Be flexible and at the same time push back respectfully to ensure we are doing what is best for the company in the long run.
Challenge the status quo by recommending / pushing for changes that improve reliability and velocity.
REQUIREMENTS FOR THE ROLE
College or University graduate with a strong desire to learn!
Proficiency in Python.
The rest of the requirements are a nice to have. You will get to learn all this and more:
Experience with configuration management systems such as Ansible.
Understanding of end-to-end technology stacks which include but is not limited to OS, Network, Application, Relational & Nonrelation Databases, interacting with APIs and Security (network & application).
Understanding of cloud-based architectures and concepts. Knowledge and hands-on experience of/in AWS and GCP (including serverless technologies, APIs, Kubernetes, etc.).
Treat infrastructure as code - You will build infrastructure inside of AWS/GCP via code. All our environments are expected to be scripted and checked in, so familiarity with tools such as Terraform and CloudFormation will come in handy here.
Experience implementing self-service solutions to reduce workload on the DevOps team and allow Development and business teams be more self-sufficient.
Experience working in collaborative environment such as Bitbucket or Git.