hybrid

Senior AI/ML engineer

Senior AI/ML Engineer

Flo is seeking a Senior AI/ML Engineer to join their AI Platform team. This role involves building and scaling a GenAI platform, developing LLM judge ecosystems, fine-tuning and serving health-domain models, and maintaining robust data and infrastructure pipelines. The engineer will work with Python, Databricks, and AWS, and collaborate across various teams.

About the role

The job

We are looking for a Senior Software Engineer with deep expertise in AI/ML infrastructure to join our AI Platform team and help build the GenAI platform that powers every AI feature at Flo.

You will bridge core infrastructure, data engineering, and LLM development to deliver production-grade medical safety judges, fine-tuning pipelines, evaluation frameworks, and real-time personalisation. The team operates 60+ LLM-based evaluation judges, develops proprietary fine-tuned health models, and maintains active partnerships with Databricks, Google, OpenAI, Anthropic, and AWS.

What you’ll do

LLM Judge Ecosystem: build and scale Judge-as-a-Service, prompt registries, calibration pipelines, and evaluation orchestration using MLflow 3.x
Fine-Tuning and Serving: develop LoRA/SFT/preference optimisation pipelines for health-domain models (Llama, Gemma, MedGemma) and manage model serving at scale on Databricks
Data and Evaluation Pipelines: build synthetic Q&A generation, golden test sets, reward function engineering, and Delta table schemas in Unity Catalog for reliable, reproducible evaluation data
Infrastructure: maintain Terraform-managed AWS infrastructure (EKS, S3, IAM), Databricks AI Gateway, and CI/CD pipelines (GitHub Actions) with evaluation gates and progressive rollout
Cross-Functional Impact: collaborate with Product, Security, Analytics, and Medical teams, develop internal SDKs and APIs consumed by 5+ teams, and engage directly with technology partners on pre-release capabilities

Experience and skills

Must have:

Engineering maturity: 7+ years of software engineering, 4+ years focused on ML/AI platforms
LLM experience: recent hands-on work with at least one of: fine-tuning, prompt engineering, LLM evaluation, or model serving
Technical stack: strong Python across production services and data pipelines, data engineering fundamentals (Spark, Delta tables, Parquet)
Platform and infrastructure: Databricks (MLflow, Unity Catalog, Model Serving), AWS (EKS/Kubernetes, IAM), Terraform, GitHub Actions
Cross-domain flexibility: comfort working across ML, data engineering, and infrastructure. You don’t need to be expert in all three, but you contribute wherever the team needs it

Nice to have:

LLM evaluation frameworks (judges, graders, calibration methodology) or fine-tuning techniques (LoRA, RLHF/DPO, model distillation)
ML data engineering: synthetic data generation, evaluation dataset design, annotation pipelines
Healthcare, regulated industry, or safety-critical AI systems experience
Prompt optimisation frameworks (DSPy or similar), feature stores (Tecton)

About the role

The job

What you’ll do

LLM Judge Ecosystem: build and scale Judge-as-a-Service, prompt registries, calibration pipelines, and evaluation orchestration using MLflow 3.x
Fine-Tuning and Serving: develop LoRA/SFT/preference optimisation pipelines for health-domain models (Llama, Gemma, MedGemma) and manage model serving at scale on Databricks
Data and Evaluation Pipelines: build synthetic Q&A generation, golden test sets, reward function engineering, and Delta table schemas in Unity Catalog for reliable, reproducible evaluation data
Infrastructure: maintain Terraform-managed AWS infrastructure (EKS, S3, IAM), Databricks AI Gateway, and CI/CD pipelines (GitHub Actions) with evaluation gates and progressive rollout
Cross-Functional Impact: collaborate with Product, Security, Analytics, and Medical teams, develop internal SDKs and APIs consumed by 5+ teams, and engage directly with technology partners on pre-release capabilities

Experience and skills

Must have:

Engineering maturity: 7+ years of software engineering, 4+ years focused on ML/AI platforms
LLM experience: recent hands-on work with at least one of: fine-tuning, prompt engineering, LLM evaluation, or model serving
Technical stack: strong Python across production services and data pipelines, data engineering fundamentals (Spark, Delta tables, Parquet)
Platform and infrastructure: Databricks (MLflow, Unity Catalog, Model Serving), AWS (EKS/Kubernetes, IAM), Terraform, GitHub Actions
Cross-domain flexibility: comfort working across ML, data engineering, and infrastructure. You don’t need to be expert in all three, but you contribute wherever the team needs it

Nice to have:

LLM evaluation frameworks (judges, graders, calibration methodology) or fine-tuning techniques (LoRA, RLHF/DPO, model distillation)
ML data engineering: synthetic data generation, evaluation dataset design, annotation pipelines
Healthcare, regulated industry, or safety-critical AI systems experience
Prompt optimisation frameworks (DSPy or similar), feature stores (Tecton)

Senior AI/ML engineer

About the role

The job

What you’ll do

Experience and skills

Must have:

Nice to have:

Senior AI/ML engineer

About the role

The job

What you’ll do

Experience and skills

Must have:

Nice to have:

Skills