onsite
Backend Software Engineer, Machine Learning Systems
Backend Software Engineer, Machine Learning Systems
As a Backend Software Engineer, Machine Learning Systems at TikTok, you will be part of a dynamic team responsible for developing and maintaining advanced machine learning systems and platforms. This role involves building large-scale distributed systems for ML, integrating GPU, RDMA networks, and storage to enhance TikTok's product experiences.
About the role
About the Team
As part of the machine learning system team at TikTok, you will contribute to building an advanced system that combines high-performance compute, networking, and storage into a powerful computing cluster. The team's mission is to provide an ML system and platform to help research scientists and engineers improve TikTok's products and user experiences.
Responsibilities
- Develop and maintain the machine learning system and platform, including training, inference, and pipeline orchestration, to support core products.
- Build large-scale systems for ML, integrating with GPU, RDMA network, and storage systems.
- Enrich the end-to-end machine learning experience and provide machine learning resources for all products of TikTok and its affiliates.
Qualifications
Minimum Qualifications
- Bachelor's degree or above, majoring in Computer Science, Engineering, or related fields.
- Programming experience with at least one modern language such as C/C++, Golang, or Python.
- Experience contributing to large-scale distributed systems and multi-tenant systems (architecture, reliability, and scaling).
- Experience contributing to Kubernetes / Kubeflow / YARN / Mesos orchestrations.
- Strong analytical abilities and problem-solving skills.
- Good communication, self-motivation, and engineering practice.
Preferred Qualifications
- Familiarity with GPU architecture and GPU clusters.
- Familiarity with at least one deep learning framework (TensorFlow, PyTorch, MXNet, or other).
- Familiarity with back-end technologies such as Django / Flask.