onsite
Senior Principal Core Infrastructure Engineer - Oracle
Devops Engineer
Lead the design and operation of hyper‑scale, fault‑tolerant distributed systems using Python, Go, Kubernetes, and AWS, ensuring high availability, performance, and data integrity through advanced telemetry and rigorous verification.
About the role
Key Responsibilities
- Architect and lead the design of elastic, interdependent distributed systems that meet hyper‑scale performance and reliability goals.
- Define scalability requirements, identify bottlenecks, and implement data‑plane solutions for large‑scale operations.
- Engineer fault‑tolerant designs that support in‑service updates, handle network partitions, and implement load‑shedding, throttling, and rate‑limiting.
- Set SLO‑aligned durability and availability standards, establish KPIs, and deploy advanced telemetry for continuous monitoring.
- Formally verify complex features, define replication and synchronization strategies, and lead critical incident resolution and operational readiness.
Requirements
- Extensive experience designing and operating large‑scale distributed systems in production.
- Proficiency in Python, Go, Kubernetes, and AWS services.
- Deep knowledge of fault tolerance, partition handling, and data consistency models.
- Strong background in telemetry, monitoring, and performance optimization.
- Excellent communication skills and ability to lead cross‑functional teams.
Skills
pythongokubernetesaws