remote
Site Reliability Engineer - NASDAQ
Site Reliability Engineer
Site Reliability Engineer focused on designing, deploying, and maintaining highly available cloud‑based services for FinTech clients, leveraging Kubernetes, CI/CD pipelines, and robust monitoring to ensure performance and reliability at scale.
About the role
Key Responsibilities
- Design, test, and deploy infrastructure changes to production systems, ensuring zero downtime and high availability.
- Implement and maintain CI/CD pipelines using tools such as GitHub Actions, Jenkins, or ArgoCD to automate deployments.
- Configure and manage Kubernetes clusters, including autoscaling, rolling updates, and resource optimization.
- Develop and maintain monitoring, alerting, and logging solutions with Prometheus, Grafana, ELK, or similar stacks.
- Collaborate with cross‑functional teams to troubleshoot incidents, perform root‑cause analysis, and implement preventive measures.
Requirements
- 3+ years of experience in Site Reliability Engineering or DevOps roles.
- Proficiency with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes).
- Strong scripting skills in Python and Bash for automation and tooling.
- Hands‑on experience with CI/CD, monitoring, and incident response practices.
- Excellent communication skills and ability to work in a fast‑paced, global environment.
Skills
kubernetescicdpythonbash