Host of The Machine Learning Podcast
Tobias Macey is a dedicated engineer with experience spanning many years and even more domains. He currently manages and leads the Data & Infrastructure Platform Engineering team at MIT Open Learning where he designs and builds cloud infrastructure to power online access to education for the global MIT community. He also owns and operates Boundless Notions, LLC where he offers design, review, and implementation advice on data infrastructure and cloud automation.
Besides the Machine Learning Podcast, Tobias host the Data Engineering Podcast, which tackles a new approach to data management every week. He also hosts Podcast.__init__ where he explores the universe of ways that the Python language is being used. By applying his experience in building and scaling data infrastructure and processing workflows, he helps the audience explore and understand the challenges inherent to machine learning and data management.
Tobias Macey has hosted 18 Episodes.
Real-Time Machine Learning Has Entered The Realm Of The Possible
March 9th, 2023 | 34 mins 29 secs
Machine learning models have predominantly been built and updated in a batch modality. While this is operationally simpler, it doesn't always provide the best experience or capabilities for end users of the model. Tecton has been investing in the infrastructure and workflows that enable building and updating ML models with real-time data to allow you to react to real-world events as they happen. In this episode CTO Kevin Stumpf explores they benefits of real-time machine learning and the systems that are necessary to support the development and maintenance of those models.
How Shopify Built A Machine Learning Platform That Encourages Experimentation
February 2nd, 2023 | 1 hr 6 mins
Shopify uses machine learning to power multiple features in their platform. In order to reduce the amount of effort required to develop and deploy models they have invested in building an opinionated platform for their engineers. They have gone through multiple iterations of the platform and their most recent version is called Merlin. In this episode Isaac Vidas shares the use cases that they are optimizing for, how it integrates into the rest of their data platform, and how they have designed it to let machine learning engineers experiment freely and safely.
Applying Machine Learning To The Problem Of Bad Data At Anomalo
January 23rd, 2023 | 59 mins 24 secs
All data systems are subject to the "garbage in, garbage out" problem. For machine learning applications bad data can lead to unreliable models and unpredictable results. Anomalo is a product designed to alert on bad data by applying machine learning models to various storage and processing systems. In this episode Jeremy Stanley discusses the various challenges that are involved in building useful and reliable machine learning models with unreliable data and the interesting problems that they are solving in the process.
Build More Reliable Machine Learning Systems With The Dagster Orchestration Engine
December 1st, 2022 | 45 mins 43 secs
data orchestration, mlops
Building a machine learning model one time can be done in an ad-hoc manner, but if you ever want to update it and serve it in production you need a way of repeating a complex sequence of operations. Dagster is an orchestration engine that understands the data that it is manipulating so that you can move beyond coarse task-based representations of your dependencies. In this episode Sandy Ryza explains how his background in machine learning has informed his work on the Dagster project and the foundational principles that it is built on to allow for collaboration across data engineering and machine learning concerns.
Solve The Cold Start Problem For Machine Learning By Letting Humans Teach The Computer With Aitomatic
September 27th, 2022 | 52 mins 7 secs
Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the "cold start" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry.
Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee
September 21st, 2022 | 51 mins 53 secs
Data is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.
Shedding Light On Silent Model Failures With NannyML
September 13th, 2022 | 1 hr 3 mins
An interview with Wojtek Kuberski about the open source NannyML project and how it combines predicted performance of your model with observed outputs to identify silent model failures.
How To Design And Build Machine Learning Systems For Reasonable Scale
September 10th, 2022 | 54 mins 9 secs
An interview with Jacopo Tagliabue about how to design machine learning systems to support operations at the scale required by a majority of companies.
Building A Business Powered By Machine Learning At Assembly AI
September 8th, 2022 | 58 mins 42 secs
An interview with Dylan Fox about the unique challenges and potential involved in building a business with machine learning as the core capability that drives the product and the approach that he has taken at Assembly AI.
Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River
August 25th, 2022 | 1 hr 15 mins
An interview with Max Halford about the benefits of streaming machine learning for systems that need to learn continuously without being taken offline and how the River library supports building those models.
Using AI To Transform Your Business Without The Headache Using Graft
August 15th, 2022 | 1 hr 7 mins
Accelerate Development And Delivery Of Your Machine Learning Projects With A Comprehensive Feature Platform
August 6th, 2022 | 50 mins 37 secs
An interview with Kevin Stumpf about the impact of a comprehensive feature platform on the development and serving of machine learning models and how they are addressing that need at Tecton.
Build Better Models Through Data Centric Machine Learning Development With Snorkel AI
July 28th, 2022 | 53 mins 49 secs
An interview with Alex Ratner about Snorkel AI's platform for data-centric machine learning development that accelerates the rate at which teams can build high quality training data sets with the help of domain experts
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
July 21st, 2022 | 1 hr 19 secs
An interview with Travis Addair about the platform that he and his team at Predibase are building to empower everyone to build and deploy deep learning models in a low code approach for declarative machine learning development and how they are extending the capabilities of the open source Ludwig and Horovod frameworks
Stop Feeding Garbage Data To Your ML Models, Clean It Up With Galileo
July 13th, 2022 | 47 mins 3 secs
An interview with Galileo co-founder Vikram Chatterji about the challenges of managing unstructured data assets for machine learning projects and how their platform is designed to ease the burden of maintaining clean data sets
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks
July 5th, 2022 | 48 mins 40 secs
An interview with Shir Chorev and Philip Tannor about model validation and testing with the open source deepchecks library and the challenges of testing machine learning projects