The job
We are looking for a Senior Software Engineer with deep expertise in AI/ML infrastructure to join our AI Platform team and help build the GenAI platform that powers every AI feature at Flo.
You will bridge core infrastructure, data engineering, and LLM development to deliver production-grade medical safety judges, fine-tuning pipelines, evaluation frameworks, and real-time personalisation. The team operates 60+ LLM-based evaluation judges, develops proprietary fine-tuned health models, and maintains active partnerships with Databricks, Google, OpenAI, Anthropic, and AWS.
What you’ll do
- LLM Judge Ecosystem: build and scale Judge-as-a-Service, prompt registries, calibration pipelines, and evaluation orchestration using MLflow 3.x
- Fine-Tuning and Serving: develop LoRA/SFT/preference optimisation pipelines for health-domain models (Llama, Gemma, MedGemma) and manage model serving at scale on Databricks
- Data and Evaluation Pipelines: build synthetic Q&A generation, golden test sets, reward function engineering, and Delta table schemas in Unity Catalog for reliable, reproducible evaluation data
- Infrastructure: maintain Terraform-managed AWS infrastructure (EKS, S3, IAM), Databricks AI Gateway, and CI/CD pipelines (GitHub Actions) with evaluation gates and progressive rollout
- Cross-Functional Impact: collaborate with Product, Security, Analytics, and Medical teams, develop internal SDKs and APIs consumed by 5+ teams, and engage directly with technology partners on pre-release capabilities
Experience and skills
Must have:
- Engineering maturity: 7+ years of software engineering, 4+ years focused on ML/AI platforms
- LLM experience: recent hands-on work with at least one of: fine-tuning, prompt engineering, LLM evaluation, or model serving
- Technical stack: strong Python across production services and data pipelines, data engineering fundamentals (Spark, Delta tables, Parquet)
- Platform and infrastructure: Databricks (MLflow, Unity Catalog, Model Serving), AWS (EKS/Kubernetes, IAM), Terraform, GitHub Actions
- Cross-domain flexibility: comfort working across ML, data engineering, and infrastructure. You don’t need to be expert in all three, but you contribute wherever the team needs it
Nice to have:
- LLM evaluation frameworks (judges, graders, calibration methodology) or fine-tuning techniques (LoRA, RLHF/DPO, model distillation)
- ML data engineering: synthetic data generation, evaluation dataset design, annotation pipelines
- Healthcare, regulated industry, or safety-critical AI systems experience
- Prompt optimisation frameworks (DSPy or similar), feature stores (Tecton)