Senior Site Reliability Engineer

Santa Monica, CA 90401

Employment Type: Perm Job Category: DevOps Job Number: 19829
Site Reliability Engineer

As a member of our team, you will work with Engineers, Product Owners and Technical Leads building new experiences and improving existing products, developing robust software solutions and dependable products for our client' s users and customers. You’ ll help in estimating engineering efforts, prioritize projects, plan implementations, and triage production issues. You need to be dynamic, collaborative and curious, as you’ ll work in a fast-paced environment where continuous experimentation and innovation are a given.

Your responsibilities will also include:
  • Operation of existing infrastructure and services.
  • Assisting in design, development, and testing of features delivered as applications and/or services, with a strong focus on ensuring/improving reliability and robustness.
  • Triage and troubleshoot escalations for a wide variety of our client' s  products.
  • Monitor, detect and troubleshoot issues during code rollouts on the live site. Analyze real-time data to determine issue severity and impact and advise Product Development and Release Management on release GO/NO-GO.
  • Advise management and appropriate groups on customer impacting issues and provide recognized technical and business leadership while recommending appropriate actions.
  • Identify process gaps and implement process improvements to increase operational efficiency.
  • Participate in the development of tools, systems and processes aimed at improving product supportability and overall support productivity.
  • Work with different groups to develop and improve monitors for our client' s products and infrastructure.
  • When needed provide direct support to client' s users and customers.
  • Identify, verify and document irregularities in our client' s functionality, including posting appropriate bugs and potentially solving them with pull requests.
  • Collaborating with peers and leads both within the team and across the organization.
  • Working with operations teams to ensure applications and services are highly available and reliable.
  • Supporting applications and/or services as and when required on a 24x7 basis.

  • Candidates of all experience levels are welcome for a variety of roles and levels.
  • BS Computer Science or related technical discipline (or equivalent experience).
  • Competent in design/implementation for reliability, availability, scalability and performance.
  • Competent in software engineering tools (Golang) and best practices (e.g. unit testing, test automation, continuous integration, etc.).
  • Competent in state-of-the-art orchestration and containerization infrastructure k8s & docker).
  • Experience with algorithms, data structures, complexity analysis and software design.
  • Strong understanding and working knowledge of networking principles and OS operation and maintenance.
  • Expertise using Linux command line.
  • Familiarity with load balancing principles.
  • Development skills in at least one scripting language.
  • Strong debugging and problem-solving skills.
  • Extra credit: Private and public cloud experience (e.g. Azure, AWS)
  • Extra credit: Provide your GitHub account or code samples with your resume!

Send an email reminder to:

Share This Job:

Related Jobs:

Login to save this search and get notified of similar positions.