Job Summary:
We are hiring a Lead Infrastructure & Cloud Engineer with a strong Wintel infrastructure foundation and current, hands-on capability in modern cloud infrastructure across Azure (primary) and AWS. This role exists to close a capability gap: we have deep on-prem expertise, and we need a leader who can define and drive modern cloud standards, guide technical direction, and uplift the team.
You’ll operate as a technical lead with an architecture mindset: creating reference designs, setting guardrails, making pragmatic trade-offs (security, resilience, cost), and leading delivery across infrastructure and hybrid cloud. This is not a DevOps role, you will collaborate with DevOps and engineers, but your focus is infrastructure/platform, governance, reliability, and technical leadership.
Job Responsibilities:
Cloud & Hybrid Architecture (Azure & AWS)
- Own the target-state hybrid cloud architecture and roadmap (12–24 months), aligning security, resilience, and cost requirements.
- Define reference architectures and standards: landing zones, network patterns, identity patterns, logging/monitoring, backup/DR, and environment separation.
- Lead design and implementation of secure cloud networking: VNets/VPCs, routing, VPN, ExpressRoute/Direct Connect, Private Link/Endpoints, load balancers, WAF where needed.
- Own cloud governance foundations: subscriptions/accounts, management groups, RBAC, naming/tagging, logging, budgets and policy guardrails.
Modern Cloud Operations (Hands-on Leadership)
- Ensure cloud platforms, services, and workloads remain on supported, secure versions; implement drift detection and lifecycle management.
- Establish platform observability: Azure Monitor/Log Analytics/App Insights, CloudWatch, OpenTelemetry where used; improve alert quality and operational readiness.
- Build and maintain backup/DR posture with tested RTO/RPO, runbooks, and regular restore/DR exercises.
- Drive FinOps discipline: cost allocation, tagging compliance, rightsizing, reservations/savings plans, and cost anomaly detection.
Security, Governance & Incident Readiness
- Ensure security controls are in place and effective (least privilege, secure baselines, encryption, key management, vulnerability/patch posture).
- Log & telemetry onboarding: own onboarding of data/log sources and integration with the SIEM (e.g., Microsoft Sentinel/Splunk) in partnership with Security.
- Lead incident response for infrastructure/cloud events: triage, investigation, reporting, RCA, and implementation of preventative controls and guardrails.
- Manage, document, and audit configuration changes; champion “repeatable by design” changes and reduce configuration drift.
Wintel & Core Infrastructure Leadership