remote

Senior Network & Site Reliability Engineer - Alembic Pharmaceuticals Ltd

Site Reliability Engineer

remote

Senior Network & Site Reliability Engineer - Alembic Pharmaceuticals Ltd

Site Reliability Engineer

Senior Network & Site Reliability Engineer driving high‑availability, scalable infrastructure for a cutting‑edge AI platform, leveraging Kubernetes, Prometheus, Grafana, AWS, Docker and CI/CD pipelines to ensure robust, secure, and performant services.

About the role

Key Responsibilities

Design, implement, and maintain highly available network and infrastructure solutions across on‑prem and cloud environments.
Lead SRE initiatives: monitoring, alerting, incident response, and post‑mortem analysis using Prometheus, Grafana, and custom dashboards.
Automate deployment pipelines with Docker, Kubernetes, Helm, and CI/CD tools to accelerate feature delivery and reduce manual toil.
Collaborate with security, compliance, and DevOps teams to enforce best practices, harden systems, and manage access controls.
Drive capacity planning, performance tuning, and cost optimization for large‑scale AI workloads on AWS and private supercomputing resources.

Requirements

5+ years of experience in network engineering and site reliability roles.
Proficient with Kubernetes, Docker, Helm, and cloud platforms (AWS preferred).
Strong scripting skills (Python, Bash) and experience with CI/CD pipelines.
Hands‑on experience with monitoring/alerting tools such as Prometheus, Grafana, and ELK stack.
Excellent problem‑solving, communication, and collaboration skills in a fast‑paced, high‑impact environment.

Skills

kubernetesprometheusgrafanaawsdockercicd

Sign Up to Apply

CompanyAlembic Pharmaceuticals Ltd

DepartmentEngineering

LocationSan Francisco, CA, United States

Experience5+ years

Tenurefull-time

LevelSenior

Salary240,000

Posted June 19, 2026