Accelerate Development And Delivery Of Your Machine Learning Projects With A Comprehensive Feature Platform

Summary

In order for a machine learning model to build connections and context across the data that is fed into it the raw data needs to be engineered into semantic features. This is a process that can be tedious and full of toil, requiring constant upkeep and often leading to rework across projects and teams. In order to reduce the amount of wasted effort and speed up experimentation and training iterations a new generation of services are being developed. Tecton first built a feature store to serve as a central repository of engineered features and keep them up to date for training and inference. Since then they have expanded the set of tools and services to be a full-fledged feature platform. In this episode Kevin Stumpf explains the different capabilities and activities related to features that are necessary to maintain velocity in your machine learning projects.

Deepchecks LogoBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!


Graft LogoGraft™ is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain.

For more information on Graft or to schedule a demo go to themachinelearningpodcast.com/graft today! And tell them Tobias sent you.


Galileo LogoData powers machine learning, but poor data quality is the largest impediment to effective ML today.

Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts.

Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations.

Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!

Announcements

  • Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.
  • Building good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!
  • Do you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft™ is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.
  • Data powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!
  • Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about the role of feature platforms in your ML engineering workflow

Interview

  • Introduction
  • How did you get involved in machine learning?
  • Can you describe what you mean by the term "feature platform"?
    • What are the components and supporting capabilities that are needed for such a platform?
  • How does the availability of engineered features impact the ability of an organization to put ML into production?
  • What are the points of friction that teams encounter when trying to build and maintain ML projects in the absence of a fully integrated feature platform?
  • Who are the target personas for the Tecton platform?
    • What stages of the ML lifecycle does it address?
  • Can you describe how you have designed the Tecton feature platform?
    • How have the goals and capabilities of the product evolved since you started working on it?
  • What is the workflow for an ML engineer or data scientist to build and maintain features and use them in the model development workflow?
  • What are the responsibilities of the MLOps stack that you have intentionally decided not to address?
    • What are the interfaces and extension points that you offer for integrating with the other utilities needed to manage a full ML system?
  • You wrote a post about the need to establish a DevOps approach to ML data. In keeping with that theme, can you describe how to think about the approach to testing and validation techniques for features and their outputs?
  • What are the most interesting, innovative, or unexpected ways that you have seen Tecton/Feast used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Tecton?
  • When is Tecton the wrong choice?
  • What do you have planned for the future of the Tecton feature platform?

Contact Info

Parting Question

  • From your perspective, what is the biggest barrier to adoption of machine learning today?

Links

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/

Liked it? Take a second to support tmacey on Patreon!