Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee

Data is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.

Read More

Shedding Light On Silent Model Failures With NannyML

Because machine learning models are constantly interacting with inputs from the real world they are subject to a wide variety of failures. The most commonly discussed error condition is concept drift, but there are numerous other ways that things can go wrong. In this episode Wojtek Kuberski explains how NannyML is designed to compare the predicted performance of your model against its actual behavior to identify silent failures and provide context to allow you to determine whether and how urgently to address them.

Read More

How To Design And Build Machine Learning Systems For Reasonable Scale

Using machine learning in production requires a sophisticated set of cooperating technologies. A majority of resources that are available for understanding how to design and operate these platforms are focused on either simple examples that don’t scale, or over-engineered technologies designed for the massive scale of big tech companies. In this episode Jacopo Tagliabue shares his vision for “ML at reasonable scale” and how you can adopt these patterns for building your own platforms.

Read More

Building A Business Where Machine Learning Is The Product At Assembly AI

The increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product.

Read More

Update Your Model’s View Of The World In Real Time With Streaming Machine Learning Using River

The majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River.

Read More

Using AI To Transform Your Business Without The Headache Using Graft

Machine learning is a transformative tool for the organizations that can take advantage of it. While the frameworks and platforms for building machine learning applications are becoming more powerful and broadly available, there is still a significant investment of time, money, and talent required to take full advantage of it. In order to reduce that barrier further Adam Oliner and Brian Calvert, along with their other co-founders, started Graft. In this episode Adam and Brian explain how they have built a platform designed to empower everyone in the business to take part in designing and building ML projects, while managing the end-to-end workflow required to go from data to production.

Read More

Accelerate Development And Delivery Of Your Machine Learning Projects With A Comprehensive Feature Platform

In order for a machine learning model to build connections and context across the data that is fed into it the raw data needs to be engineered into semantic features. This is a process that can be tedious and full of toil, requiring constant upkeep and often leading to rework across projects and teams. In order to reduce the amount of wasted effort and speed up experimentation and training iterations a new generation of services are being developed. Tecton first built a feature store to serve as a central repository of engineered features and keep them up to date for training and inference. Since then they have expanded the set of tools and services to be a full-fledged feature platform. In this episode Kevin Stumpf explains the different capabilities and activities related to features that are necessary to maintain velocity in your machine learning projects.

Read More

Build Better Models Through Data-Centric Machine Learning Development With Snorkel AI

Machine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost.

Read More

Declarative Machine Learning For High Performance Deep Learning Models With Predibase

Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle.

Read More

Stop Feeding Garbage Data To Your ML Models, Clean It Up With Galileo

Machine learning is a force multiplier that can generate an outsized impact on your organization. Unfortunately, if you are feeding your ML model garbage data, then you will get orders of magnitude more garbage out of it. The team behind Galileo experienced that pain for themselves and have set out to make data management and cleaning for machine learning a first class concern in your workflow. In this episode Vikram Chatterji shares the story of how Galileo got started and how you can use their platform to fix your ML data so that you can get back to the fun parts.

Read More