Job Summary:
We are seeking a talented and motivated Site Reliability Engineer (SRE) with strong expertise in either Python or Golang, proficient in Bash or Shell scripting, and extensive experience working with Linux. The ideal candidate should have a solid background in SRE principles. This role will focus on maintaining and improving the reliability, availability, and performance of our systems and infrastructure.
Responsibilities:
- Design, implement, and maintain scalable, reliable, and secure infrastructure.
- Develop and manage monitoring, alerting, and incident response processes.
- Automate tasks using Python or Golang, as well as Bash or Shell scripting, to streamline operations and reduce manual intervention.
- Conduct root cause analysis and implement proactive measures to minimize future incidents.
- Collaborate with cross-functional teams to ensure infrastructure stability and efficiency.
- Participate in on-call rotations and manage incident escalations to ensure prompt response to issues
Requirements:
- Experience in SRE: Proven track record in a Site Reliability Engineering role, with deep knowledge of SRE principles and practices.
- Programming Skills: Proficiency in either Python or Golang for automation and tool development.
- Scripting Skills: Strong experience with Bash or Shell scripting for task automation.
- Linux Expertise: Extensive experience with Linux OS, including troubleshooting, performance tuning, and system administration.
- Problem Solving: Strong analytical and troubleshooting skills with the ability to resolve complex technical issues.