onsite

Lead AI/ML Engineer

New Street Technologies is seeking a Lead AI/ML Engineer to own the AI/ML track end-to-end for their AI-native platform in financial services. This hands-on role involves leading a team to build and run production-grade models, setting technical direction, and shipping systems for enterprise customers. The ideal candidate will have 5+ years of experience in ML/applied AI, strong foundations in machine learning and neural networks, and production experience with LLMs.

About the role

ABOUT THE ROLE

We are building an AI-native platform for regulated financial services. AI and machine learning are not features bolted on — they are how the product works. We are looking for a lead engineer to own the AI/ML track end-to-end and grow it into a capability.

You will lead the team that builds and runs our models in production — language models, neural networks, classical ML where it earns its place. You will set the technical direction, run the experiments that matter, and ship the systems that go live for enterprise customers. This is a hands-on lead role — you write code, you read papers, and you stay close to the model.

WHAT YOU'LL DO

Lead all AI/ML work — model selection, fine-tuning, neural-network design where appropriate, evaluation and deployment. Own the track and the outcomes.
Build production AI systems — take models from notebook to live, monitored deployment inside enterprise environments (fine-tuning, distillation/quantisation, drift detection, rollback playbooks).
Set the evaluation discipline — golden datasets, regression suites, and quality gates that decide when a model version ships and when it does not.
Decide where AI wins — identify which tasks call for an LLM, a smaller fine-tuned model, classical ML, or deterministic code, and document the reasoning.
Make it deployable — work with SRE on inference (CPU/GPU), batching, latency targets and the runbook for shipping a new model version.
Grow the practice — mentor AI/ML engineers, set the bar for experiments and reproducibility, and hire as the team expands.

WHAT YOU BRING

5+ years in ML / applied AI, with hands-on work shipping models — not just notebooks or POCs. Something you built must be running in production today, and you can describe what it does, how it is evaluated and how it is monitored.
Strong machine-learning and neural-network foundations — you can design, train and debug a neural network from scratch, and understand the trade-offs vs classical ML or off-the-shelf LLMs.
LLM production experience — fine-tuning (SFT, LoRA/QLoRA, DPO or equivalent), prompt engineering, RAG, evaluation and guardrails.
Strong Python — production-quality code, comfort with PyTorch and distributed training (DeepSpeed / FSDP / Accelerate).
Working knowledge of inference optimisation — quantisation (GPTQ/AWQ/bitsandbytes), serving stacks (vLLM, TGI, TensorRT-LLM), and latency/cost trade-offs.
Clear technical communicator — able to brief a CTO or compliance head on what a model can and cannot do without hand-waving.

WHAT WOULD BE GREAT TO HAVE

Shipping an on-prem or VPC-deployed model into a regulated industry (banking, healthcare, defence).
Data-curation pipelines — synthetic data generation, deduplication, contamination detection for instruction tuning.
ML guardrails — output filtering, jailbreak resistance, RAG-grounded factuality scoring.
Hands-on with agent orchestration frameworks (LangGraph, custom DAGs).

TECH STACK

Languages & ML: Python 3.11+, PyTorch, Hugging Face Transformers, scikit-learn, XGBoost
Fine-tuning & training: PEFT (LoRA/QLoRA), TRL, DeepSpeed / FSDP / Accelerate, bitsandbytes
Inference & serving: vLLM, TGI, TensorRT-LLM, ONNX, quantisation (GPTQ/AWQ)
Orchestration & RAG: LangGraph, LangChain, custom DAGs, pgvector / Qdrant / Weaviate
Evaluation: RAGAS, DeepEval, custom golden-set harnesses
Backend & data: FastAPI, PostgreSQL, Redis, Celery, Docker
Cloud: AWS (SageMaker, EC2/GPU, S3), on-prem inference for regulated deployments
Observability: MLflow / Weights & Biases, Prometheus, Grafana

About the role

ABOUT THE ROLE

WHAT YOU'LL DO

Lead all AI/ML work — model selection, fine-tuning, neural-network design where appropriate, evaluation and deployment. Own the track and the outcomes.
Build production AI systems — take models from notebook to live, monitored deployment inside enterprise environments (fine-tuning, distillation/quantisation, drift detection, rollback playbooks).
Set the evaluation discipline — golden datasets, regression suites, and quality gates that decide when a model version ships and when it does not.
Decide where AI wins — identify which tasks call for an LLM, a smaller fine-tuned model, classical ML, or deterministic code, and document the reasoning.
Make it deployable — work with SRE on inference (CPU/GPU), batching, latency targets and the runbook for shipping a new model version.
Grow the practice — mentor AI/ML engineers, set the bar for experiments and reproducibility, and hire as the team expands.

WHAT YOU BRING

5+ years in ML / applied AI, with hands-on work shipping models — not just notebooks or POCs. Something you built must be running in production today, and you can describe what it does, how it is evaluated and how it is monitored.
Strong machine-learning and neural-network foundations — you can design, train and debug a neural network from scratch, and understand the trade-offs vs classical ML or off-the-shelf LLMs.
LLM production experience — fine-tuning (SFT, LoRA/QLoRA, DPO or equivalent), prompt engineering, RAG, evaluation and guardrails.
Strong Python — production-quality code, comfort with PyTorch and distributed training (DeepSpeed / FSDP / Accelerate).
Working knowledge of inference optimisation — quantisation (GPTQ/AWQ/bitsandbytes), serving stacks (vLLM, TGI, TensorRT-LLM), and latency/cost trade-offs.
Clear technical communicator — able to brief a CTO or compliance head on what a model can and cannot do without hand-waving.

WHAT WOULD BE GREAT TO HAVE

Shipping an on-prem or VPC-deployed model into a regulated industry (banking, healthcare, defence).
Data-curation pipelines — synthetic data generation, deduplication, contamination detection for instruction tuning.
ML guardrails — output filtering, jailbreak resistance, RAG-grounded factuality scoring.
Hands-on with agent orchestration frameworks (LangGraph, custom DAGs).

TECH STACK

Languages & ML: Python 3.11+, PyTorch, Hugging Face Transformers, scikit-learn, XGBoost
Fine-tuning & training: PEFT (LoRA/QLoRA), TRL, DeepSpeed / FSDP / Accelerate, bitsandbytes
Inference & serving: vLLM, TGI, TensorRT-LLM, ONNX, quantisation (GPTQ/AWQ)
Orchestration & RAG: LangGraph, LangChain, custom DAGs, pgvector / Qdrant / Weaviate
Evaluation: RAGAS, DeepEval, custom golden-set harnesses
Backend & data: FastAPI, PostgreSQL, Redis, Celery, Docker
Cloud: AWS (SageMaker, EC2/GPU, S3), on-prem inference for regulated deployments
Observability: MLflow / Weights & Biases, Prometheus, Grafana

Lead AI/ML Engineer

About the role

ABOUT THE ROLE

WHAT YOU'LL DO

WHAT YOU BRING

WHAT WOULD BE GREAT TO HAVE

TECH STACK

Lead AI/ML Engineer

About the role

ABOUT THE ROLE

WHAT YOU'LL DO

WHAT YOU BRING

WHAT WOULD BE GREAT TO HAVE

TECH STACK

Skills