In many ML use cases, model performance is highly dependent on the quality of the features they are trained and inference on. One of the important dimensions of feature quality is the freshness of the data. Therefore, it is critical to ensure that the features remain up-to-date to the problem being solved.
The presentation will cover the impact of feature freshness on model performance based on experiments both in training data and inference data. We will also discuss various strategies and techniques that can be used to improve feature freshness, including in streaming and batch feature processing. It will also discuss the challenges and tradeoffs that come with implementing these strategies in large scale machine learning systems, such as the computational cost and scalability issues.
By keeping the features fresh and relevant, organizations can achieve better results and stay ahead of the competition in today's rapidly evolving data-driven landscape.
What's the focus of your work these days?
My current area of focus revolves around developing techniques to prepare data for machine learning inference on a large scale. At the same time, I aim to enhance reliability, improve efficiency, and minimize latency in the process.
What's the motivation for your talk at QCon New York 2023?
I would like to share our learnings while working on these projects with the industry.
How would you describe your main persona and target audience for this session?
The target audience would be experienced technologists in the industry who work on large scale data processing for machine learning.
Is there anything specific that you'd like people to walk away with after watching your session?
There are a few key takeaways:
- Improving data freshness is becoming more and more important in ML tasks
- However not all your data need to be super fresh. Optimize for ROI instead of freshness alone
- Design your system end to end, instead of focusing on localized optimization
Engineering Manager @Facebook AI Infra
Zhongliang has over a decade of experience working in the domain of big data and large scale distributed systems. His most recent focus is on developing advanced data infrastructure for ML data processing at Meta, which powers the SOTA recommendation systems in the industry.
Previously, Zhngliang worked at LinkedIn, Microsoft BingAds and Vertica Systems, where he worked on building distributed online and offline systems as well as high speed analytical database. Zhongliang also serves as a member of the Steering Committee for the Machine Learning Platform Meetup, where he facilitates the sharing of the latest technology advancements in the ML platform community.