ABOUT THE ROLE
We are building an AI-native platform for regulated financial services. AI and machine learning are not features bolted on — they are how the product works. We are looking for a lead engineer to own the AI/ML track end-to-end and grow it into a capability.
You will lead the team that builds and runs our models in production — language models, neural networks, classical ML where it earns its place. You will set the technical direction, run the experiments that matter, and ship the systems that go live for enterprise customers. This is a hands-on lead role — you write code, you read papers, and you stay close to the model.
WHAT YOU'LL DO
- Lead all AI/ML work — model selection, fine-tuning, neural-network design where appropriate, evaluation and deployment. Own the track and the outcomes.
- Build production AI systems — take models from notebook to live, monitored deployment inside enterprise environments (fine-tuning, distillation/quantisation, drift detection, rollback playbooks).
- Set the evaluation discipline — golden datasets, regression suites, and quality gates that decide when a model version ships and when it does not.
- Decide where AI wins — identify which tasks call for an LLM, a smaller fine-tuned model, classical ML, or deterministic code, and document the reasoning.
- Make it deployable — work with SRE on inference (CPU/GPU), batching, latency targets and the runbook for shipping a new model version.
- Grow the practice — mentor AI/ML engineers, set the bar for experiments and reproducibility, and hire as the team expands.
WHAT YOU BRING
- 5+ years in ML / applied AI, with hands-on work shipping models — not just notebooks or POCs. Something you built must be running in production today, and you can describe what it does, how it is evaluated and how it is monitored.
- Strong machine-learning and neural-network foundations — you can design, train and debug a neural network from scratch, and understand the trade-offs vs classical ML or off-the-shelf LLMs.
- LLM production experience — fine-tuning (SFT, LoRA/QLoRA, DPO or equivalent), prompt engineering, RAG, evaluation and guardrails.
- Strong Python — production-quality code, comfort with PyTorch and distributed training (DeepSpeed / FSDP / Accelerate).
- Working knowledge of inference optimisation — quantisation (GPTQ/AWQ/bitsandbytes), serving stacks (vLLM, TGI, TensorRT-LLM), and latency/cost trade-offs.
- Clear technical communicator — able to brief a CTO or compliance head on what a model can and cannot do without hand-waving.
WHAT WOULD BE GREAT TO HAVE
- Shipping an on-prem or VPC-deployed model into a regulated industry (banking, healthcare, defence).
- Data-curation pipelines — synthetic data generation, deduplication, contamination detection for instruction tuning.
- ML guardrails — output filtering, jailbreak resistance, RAG-grounded factuality scoring.
- Hands-on with agent orchestration frameworks (LangGraph, custom DAGs).
TECH STACK
- Languages & ML: Python 3.11+, PyTorch, Hugging Face Transformers, scikit-learn, XGBoost
- Fine-tuning & training: PEFT (LoRA/QLoRA), TRL, DeepSpeed / FSDP / Accelerate, bitsandbytes
- Inference & serving: vLLM, TGI, TensorRT-LLM, ONNX, quantisation (GPTQ/AWQ)
- Orchestration & RAG: LangGraph, LangChain, custom DAGs, pgvector / Qdrant / Weaviate
- Evaluation: RAGAS, DeepEval, custom golden-set harnesses
- Backend & data: FastAPI, PostgreSQL, Redis, Celery, Docker
- Cloud: AWS (SageMaker, EC2/GPU, S3), on-prem inference for regulated deployments
- Observability: MLflow / Weights & Biases, Prometheus, Grafana