Data Cleaning

Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee

Data is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.

Read More

Build Better Models Through Data-Centric Machine Learning Development With Snorkel AI

Machine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost.

Read More

Stop Feeding Garbage Data To Your ML Models, Clean It Up With Galileo

Machine learning is a force multiplier that can generate an outsized impact on your organization. Unfortunately, if you are feeding your ML model garbage data, then you will get orders of magnitude more garbage out of it. The team behind Galileo experienced that pain for themselves and have set out to make data management and cleaning for machine learning a first class concern in your workflow. In this episode Vikram Chatterji shares the story of how Galileo got started and how you can use their platform to fix your ML data so that you can get back to the fun parts.

Read More