Reliability is an extremely important aspect of our app. As a Site Reliability Engineering, you will play a crucial role to ensure high availability, reliability, and stability on both our internal and external user-facing systems. Additionally SRE’s will keep an ever-watchful eye on the capacity and performance of our system.
Responsibilities :
- Collaborate with key stakeholders across Product, Engineering, IT Security, and other teams on initiatives and capabilities related to the operational health, security, growth, and design of our applications.
- Monitor, analyze and tackle potential reliability issues by implementing comprehensive monitoring tools & metrics across different systems.
- During troubleshoots, you will need to lead your team on root cause analysis, pattern identification and continuous improvement in order to optimize application performance, resilience and reliability.
- Drive service reliability by developing tooling that enables metric visibility using SLIs & SLOs
- Develop safe rollout plans for our services to prevent potential outages.
- 4+ years of experience in a site reliability engineering, DevOps, or cloud architect role. Hands-on experience in defining processes and implementing best practices for enterprise scale infrastructures.
- Experienced in working with Docker and Kubernetes is a must
- Experienced in working with modern infrastructure and monitoring tech stacks for enterprise-scale applications ie. New Relic, Datadog, ELK, Cloud Service Platforms, CI/CD pipelines etc.
- Knowledge about storage options like SQL, NoSQL and distributed storage
- Exemplary leadership and communication abilities (both verbal and written) are a must
- Experience in activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning, etc.
- Proactive in finding problems, areas of improvements and performance bottlenecks for distributed systems.
- Skilled in Bash scripting and general Linux commands. Other programming languages such as Go/Python/Java is a big plus.
- Supportive teammates
- Health insurance benefits
- Self-development activities that support careers
- Flexible working hours
- Remote working during pandemic
- Broaden your knowledge in financial investment
Silakan referensi bahwa Anda menemukan lowongan kerja ini
di Fungsi.id, ini membantu kami mendapatkan lebih banyak
lowongan kerja berkualitas di sini, terima kasih!