About ByteDance
Founded in 2012, ByteDance is a technology company operating a range of content platforms that inform, educate, entertain and inspire people across languages, cultures and geographies. With a suite of more than a dozen products, including TikTok, Douyin and Toutiao. ByteDance now has a portfolio of applications available in over 150 markets and 75 languages.
ByteDance's big-data Computing Engine team is responsible for ByteDance's offline computing, streaming computing and real-time computing engine. We support many core businesses and teams such as applied machine-learning, recommendation, warehouse, search, advertising, streaming media, as well as security and risk control.
- Offline computing is mainly based on Spark, with hundreds of thousands of jobs on average per day, covering ETL, offline data processing, ad-hoc queries and other scenarios. We support ByteDance internal recommendation / advertising / search and other large-scale data processing of a large number of core businesses.
- Streaming computing is mainly about Flink, with the total number of tasks reaching as high as tens of thousands, covering multiple business scenarios such as ETL, real-time monitoring, real-time features, etc. We also support the construction of ByteDance internal real-time data warehouse and streaming-batch integration business scenarios.
- The real-time compute engine is based on a ByteDance in-house solution, covering real-time warehouse, real-time online services, high-frequency updates, online feature store and other machine learning scenarios. We also support a series of internal ByteDance advertising, live broadcasts, recommendation, and other use cases that require data processing and real-time online services.
Responsibilities
- Construct an efficient, real-time and reliable distributed computing engine and machine learning scheduling system. Apply continuous optimization to improve the system stability, performance and next generation infrastructure evolution.
- Play a key role in driving project objectives and execution management, architecture design and code reviews, define long term technical roadmap for the team, and demonstrate technical leadership.
- Recruit and mentor engineers with various levels of experience into a performant team, build team culture and cohesion, help team members grow technical capabilities.
- Work closely with stakeholder teams to identify business pain points and continuously optimize and improve customer experience, team reputation and impact.
Qualifications
- Bachelor's or higher degree in Computer Science or related fields.
- 6+ years of professional software development experience.
- Strong with Java/Golang/Python program development (at least one), pursue high-quality code and focused on code engineering quality.
- Solid knowledge of Linux system, proficient in multi-threading, network programming, and distributed development in any programming language.
- Expert in large-scale distributed compute system design and implementation, able to think clearly, and quickly deep dive into problems and solve them.
- Good communication skills, able to lead and drive projects effectively. Familiar with team and project management methodologies and approaches.
Preferred Qualifications
- In-depth research and industry experience in large-scale compute engine design and implementation, for example Yarn/Mesos/Kubernetes/MapReduce/Spark/Flink/Storm.
- Contributor/Committer or PMC of the open-source community is a plus.