Stockbit sedang merekrut seorang

SRE Manager

Loker ini dibuat lebih dari 2 bulan yang lalu
Cek ketersediaan dengan klik lamar. Tidak tersedia? Cek loker lain di Jakarta.

Reliability is an extremely important aspect of our app. As a Site Reliability Engineering Manager, you will play a crucial role in leading a team of individuals who are responsible to build and run large-scale, massively distributed, fault-tolerant systems. The team’s main task is to ensure high availability, reliability, and stability on both our internal and external user-facing systems. Additionally SRE’s will keep an ever-watchful eye on the capacity and performance of our system.


Responsibilities :

  • Collaborate with key stakeholders across Product, Engineering, IT Security, and other teams on initiatives and capabilities related to the operational health, security, growth, and design of our applications.
  • Monitor, analyze and tackle potential reliability issues by implementing comprehensive monitoring tools & metrics across different systems.
  • Provide observability and insights for Business, Product and other non-tech teams in regards to service reliability & customer experience as part of the OKR supporting tool.
  • During troubleshoots, you will need to lead your team on root cause analysis, pattern identification and continuous improvement in order to optimize application performance, resilience and reliability.
  • Drive service reliability by developing tooling that enables metric visibility using SLIs, SLOs, and SLAs
  • Provide advice/solution and lead the initiatives from the infrastructure team to improve our availability, performance, efficiency, change management, monitoring, emergency response, and capacity planning
  • Be a mentor, coordinator and provide guidance to the SRE team
  • Develop safe rollout plans for our Services to prevent potential outages.
  • 6+ years of experience in a site reliability engineering, DevOps, or cloud architect role. Hands-on experience in defining processes and implementing best practices for enterprise scale infrastructures.
  • Experienced in working with modern infrastructure and monitoring tech stacks for enterprise-scale applications ie. New Relic, Datadog, ELK, Kubernetes, Cloud Service Platforms, CI/CD pipelines etc.
  • Experienced in successfully managing a distributed team of 5-8 engineers on large-scale projects that included technical deep-dives and production troubleshooting in the areas of: distributed systems, code, networking, storage, and operating systems.
  • Exemplary leadership and communication abilities (both verbal and written) are a must
  • Experience in activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning, etc.
  • Skilled in Bash scripting and general Linux commands. Other programming languages such as Go/Python/Java is a big plus.
  • Supportive teammates
  • Health insurance benefits
  • Self-development activities that support careers
  • Flexible working hours
  • Remote working during pandemic
  • Broaden your knowledge in financial investment
Silakan referensi bahwa Anda menemukan lowongan kerja ini di Fungsi.id, ini membantu kami mendapatkan lebih banyak lowongan kerja berkualitas di sini, terima kasih!
Jenis kontrak
Full Time
Lokasi
Tanggal posting
16 November, 2022