Stockbit sedang merekrut seorang

SRE Manager

Loker ini dibuat lebih dari 2 bulan yang lalu

Cek ketersediaan dengan klik lamar. Tidak tersedia? Cek loker lain di Jakarta.

Reliability is an extremely important aspect of our app. As a Site Reliability Engineering Manager, you will play a crucial role in leading a team of individuals who are responsible to build and run large-scale, massively distributed, fault-tolerant systems. The team’s main task is to ensure high availability, reliability, and stability on both our internal and external user-facing systems. Additionally SRE’s will keep an ever-watchful eye on the capacity and performance of our system.

Responsibilities :

Collaborate with key stakeholders across Product, Engineering, IT Security, and other teams on initiatives and capabilities related to the operational health, security, growth, and design of our applications.
Monitor, analyze and tackle potential reliability issues by implementing comprehensive monitoring tools & metrics across different systems.
Provide observability and insights for Business, Product and other non-tech teams in regards to service reliability & customer experience as part of the OKR supporting tool.
During troubleshoots, you will need to lead your team on root cause analysis, pattern identification and continuous improvement in order to optimize application performance, resilience and reliability.
Drive service reliability by developing tooling that enables metric visibility using SLIs, SLOs, and SLAs
Provide advice/solution and lead the initiatives from the infrastructure team to improve our availability, performance, efficiency, change management, monitoring, emergency response, and capacity planning

Be a mentor, coordinator and provide guidance to the SRE team
Develop safe rollout plans for our Services to prevent potential outages.

6+ years of experience in a site reliability engineering, DevOps, or cloud architect role. Hands-on experience in defining processes and implementing best practices for enterprise scale infrastructures.
Experienced in working with modern infrastructure and monitoring tech stacks for enterprise-scale applications ie. New Relic, Datadog, ELK, Kubernetes, Cloud Service Platforms, CI/CD pipelines etc.
Experienced in successfully managing a distributed team of 5-8 engineers on large-scale projects that included technical deep-dives and production troubleshooting in the areas of: distributed systems, code, networking, storage, and operating systems.
Exemplary leadership and communication abilities (both verbal and written) are a must
Experience in activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning, etc.
Skilled in Bash scripting and general Linux commands. Other programming languages such as Go/Python/Java is a big plus.

Supportive teammates
Health insurance benefits
Self-development activities that support careers
Flexible working hours
Remote working during pandemic
Broaden your knowledge in financial investment

Lamar loker ini

Silakan referensi bahwa Anda menemukan lowongan kerja ini di Fungsi.id, ini membantu kami mendapatkan lebih banyak lowongan kerja berkualitas di sini, terima kasih!

Stockbit

Cek website

Lamar loker ini

Jenis kontrak

Full Time

Lokasi

Jakarta

Tanggal posting

16 November, 2022

Kategori

👑 Manager

Lowongan Manager lain di Jakarta

Semua lowongan Manager di Jakarta