Site Reliability Engineer
Lead Site Reliability Engineer with 8+ years of experience driving observability, performance, and reliability for large-scale systems. Expert in APM, IaC, automation, and distributed tracing with OpenTelemetry, guiding teams to implement robust, scalable SRE practices.
Seeking Experienced Site Reliability Engineering (SRE) / Lead Engineer for Exciting Projects Remote in Guadalajara, Jalisco
We are looking for skilled Site Reliability Engineering (SRE) / Lead Engineer with a minimum of 8 years of experience to join a dynamic team within a leading organization. This role must have deep expertise in Application Performance Monitoring (APM), Infrastructure as Code (IaC), automation, and distributed tracing using OpenTelemetry.
As a SRE lead, he will guide the design, implementation, and continuous improvement of observability solutions, ensuring system reliability, performance, and scalability while fostering best practices in SRE and DevOps.
Key Responsibilities:
· -Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements.
· -Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices.
· -Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments.
· -Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency.
· -Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies.
· -Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements.
· -Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices.
· -Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships.
· -Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence.
Technical Skills Required:
· - 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities.
· - Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation.
· -Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace.
· -Strong proficiency in Infrastructure as Code (IaC) using Terraform.
· -Solid understanding of cloud platforms including AWS, GCP, or Azure.
· -Experience with automation/configuration management tools like Ansible, Chef, or Puppet.
·
Posted June 18, 2026