This is a remote position.
Location: Currently remote but may transition to onsite in Future
Competency Weighting
Core Accountability
- Own end-to-end cost governance, intelligence, and risk control across all CloudLabs Azure, AWS, and GCP environments.
- Single throat to choke for: no surprise spikes, real-time visibility, anomaly detection, root-cause closure, and refund/credit recovery.
- Accountable directly to the COO.
Key Responsibilities
1. Multi-cloud cost ownership (primary)
- Own total cloud spends across Azure (lead), AWS, and GCP -every tenant, account, project, subscription, and ODL.
- Maintain a single source of truth : daily/weekly/monthly spend, with every dollar classified as Expected (revenue-linked), Internal (non-billable), or Unexpected (leak/anomaly).
- Be able to explain any line item — across any of the three clouds — on demand, without convening a war room.
2. Real-time monitoring & anomaly detection
- Azure (50–60%): Azure Cost Management + Partner Center, threshold/budget alerts per tenant, per deployment, per lab; native anomaly detection.
- AWS (20–30%): Cost Explorer, Cost & Usage Reports (CUR) , AWS Budgets, Cost Anomaly Detection, Organizations consolidated billing.
- GCP (~10%): Billing export to BigQuery , budget alerts, billing-account-level monitoring.
- Hard SLO: no anomaly on any cloud goes undetected beyond 24 hours.
3. Incident ownership & RCA
- Own every cost incident end-to-end across clouds - root cause, quantified impact, and the engineering fix.
- Maintain an RCA repository with recurring patterns (e.g., billing-after-deletion, orphaned resources, marketplace/AI-unit leakage) and preventive controls.
4. Platform ↔ cloud cost integrity
- Drive engineering to close lifecycle gaps where CloudLabs "deleted" status diverges from actual cloud state — on all three providers .
- Institute reconciliation checks and audit logs so deletion-to-billing integrity is verifiable, not assumed.
5. Data & pipeline ownership
- Own all cost data pipelines - Partner Center, AWS CUR, GCP BigQuery export, Power BI/SQL reporting layers.
- Guarantee data completeness (no repeat of the missing-May-usage gap) and schema resilience (no repeat of the Tier2MPNID pipeline break).
6. Refund, credit & recovery
- Identify all refund/credit-eligible scenarios across Azure (MS support escalations), AWS, and GCP; drive tickets to closure; maintain a recovery tracker with status and ownership.
7. Forecasting & optimization
- Build spend-prediction models tied to deployments, labs, and usage patterns across clouds.