Software Engineer - Observability Platform
Senior Software Engineer - Observability Platform position — see original posting for full details.
Join a globally diverse team that both builds and finds best-of-breed tools to bring critical Observability services to all of Adobe . Our team embodies DevOps, as our responsibilities range from crafting new tools and UIs to maintaining and supporting one of the largest logging deployments in the industry, in partnership with other observability tools.
We’re a close-knit team dedicated to providing a robust platform, supporting both Adobe ’s engineering teams and each other. We need a new Developer to help shape and implement Adobe ’s observability strategy.
If you enjoy owning complex, high-impact problems where your work directly moves the needle for Adobe ’s engineering community, come talk to us.
Job Requirements
5-8+ years of production-level experience with distributed applications at scale in public and/or private cloud
Proven experience designing and contributing to the architecture of large-scale Observability platforms
Must Have
Deep hands-on experience with internally hosted logging systems such as Splunk, ClickHouse, Loki, or Elastic; track record of improving environment performance, stability, and cost efficiency at scale
Experience with OpenTelemetry — including collector configuration, pipelines, and instrumentation — as a core requirement given Adobe ’s OTel-native observability strategy
Ability to own and drive ingestion cost optimization end-to-end: analyzing pipeline data, designing guardrails, and engaging directly with customer engineering teams to identify and reduce unnecessary log volume
Experience integrating AI workflows into large-scale deployments; ability to design and implement AI-assisted tooling that automates user interactions and surfaces actionable insights from high-volume log datasets
Strong programming skills in Go and/or Python; experience building production-grade integrations and applications for large-scale Observability environments
Experience developing, deploying, and operating distributed applications on cloud platforms; strong command of container and orchestration technologies (Docker, Kubernetes)
Proven ability to design systems for fault tolerance, scalability, and stability, and to lead resolution of high-complexity performance and reliability issues
Experience defining service level objectives (SLOs) and service level indicators (SLIs); able to translate platform health into meaningful, measurable quality indicators
Knowledge of public and/or private cloud deployments — AWS, Azure, Data Center
Comfortable owning on-call coverage across a multi-tool observability stack, including leading incident response for high-severity issues
Good to Have
Experience evaluating or prototyping alternative storage/processing backends (e.g., ClickHouse, Loki) as part of platform cost reduction and scalability strategy; ability to contribute to a phased
Posted June 10, 2026