GAQ127R41
Location: San Francisco, CA (Hybrid)
About the Role
At Databricks Information Technology, we are a product-led organization transforming the way we work, from how easy it is to use our IT services to the applications we develop that help us scale seamlessly in the face of incredible growth.
As a Senior Software Engineer (Infrastructure) , you will be a core technical contributor on the IT Infrastructure team, owning and driving the evolution of our core infrastructure and observability platforms. This role requires a strong software engineering mindset, deep technical breadth across SRE and infrastructure worlds, and the ability to deliver high-quality, scalable solutions for currently "immature" system problems. You will be responsible for building resilient, scalable, and automated infrastructure that empowers our development teams. As a senior member of the team, you will bridge the gap between software engineering and systems architecture, ensuring our AWS environment is cost-optimized, secure, and highly available.
The Impact You Will Have
- Architect and Automate: Design and deploy production-grade infrastructure on AWS using Terraform or Pulumi.
- Orchestration: Manage and scale containerized workloads using AKS (Azure Kubernetes Service) or EKS, focusing on cluster security and resource efficiency.
- CI/CD Excellence: Architect robust deployment pipelines using GitHub Actions , managing both GitHub-hosted and self-hosted runners for specialized build requirements.
- Drive "Observable by Default" Frameworks: Create underlying infrastructure to ensure new internal applications are secure and have logging and metrics enabled by default
- Tooling, Scripting & AI : Build internal CLI tools,AI plugins and automation scripts to streamline developer workflows and enhance operational efficiency
- Partner Cross-Functionally: Collaborate with stakeholders across Security, Engineering, Infrastructure, and Support to deliver impactful projects with real business outcomes.
- Mentor and Document: Participate in Code reviews, Document solutions and failure triage playbooks, and mentor junior engineers on the platforms you own.
What We Look For
- Software Engineering Expertise: 5+ years of production-level experience with a strong proficiency in Python (non-negotiable).
- IaC: Expert-level proficiency in Terraform (modules, state management) or Pulumi(Preferred) .
- Cloud & Infrastructure Breadth: Hands-on experience with AWS (or Azure/GCP), Kubernetes , Docker and containerization concepts.
- Automation & Integration Mindset: Experience building and troubleshooting integrations between infrastructure, data pipelines, and observability platforms.
- CI/CD: Advanced knowledge of Github Actions, Github Runners.
- Strong Observability Mindset: Understanding of observabilit