Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the "cold start" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry.
- Hello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.
- Predibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!
- Your host is Tobias Macey and today I’m interviewing Christopher Nguyen about how to address the cold start problem for ML/AI projects
- How did you get involved in machine learning?
- Can you describe what the "cold start" or "small data" problem is and its impact on an organization’s ability to invest in machine learning?
- What are some examples of use cases where ML is a viable solution but there is a corresponding lack of usable data?
- How does the model design influence the data requirements to build it? (e.g. statistical model vs. deep learning, etc.)
- What are the available options for addressing a lack of data for ML?
- What are the characteristics of a given data set that make it suitable for ML use cases?
- Can you describe what you are building at Aitomatic and how it helps to address the cold start problem?
- How have the design and goals of the product changed since you first started working on it?
- What are some of the education challenges that you face when working with organizations to help them understand how to think about ML/AI investment and practical limitations? What are the most interesting, innovative, or unexpected ways that you have seen Aitomatic/H1st used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Aitomatic/H1st?
- When is a human/knowledge driven approach to ML development the wrong choice?
- What do you have planned for the future of Aitomatic?
- From your perspective, what is the biggest barrier to adoption of machine learning today?
- Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email email@example.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Human First AI
- Knowledge First World Symposium
- Atari 800
- Cold start problem
- Scale AI
- Snorkel AI
- Anomaly Detection
- Expert Systems
- ICML == International Conference on Machine Learning
- NIST == National Institute of Standards and Technology
- Multi-modal Model
- SVM == Support Vector Machine
- OSS Capital
Predibase’s founders saw the pain of getting ML models developed and in-production, taking up to a year even at leading tech companies like Uber, so they built internal platforms that drastically lowered the time-to-value and increased access. The key was taking a “declarative approach” to machine learning, which Piero Molino (CEO) introduced with Ludwig, an open source framework to create deep learning models with 8,400+ GitHub stars, more than 100 contributors, and thousands of monthly downloads. With Ludwig, tasks that took months-to-years were handed off to teams in thirty minutes and just six lines of human-readable configuration that can define an entire machine learning pipeline.
Now with Predibase, we are bringing the power of declarative machine learning built on top of Ludwig to broader organizations with our enterprise platform. Like Infrastructure as Code simplified IT, Predibase’s machine learning (ML) platform allows users to focus on the “what” of their ML models rather than the “how”, breaking free of the usual limits in low-code systems and bringing down the time-to-value of ML projects from years to days.
Click here to learn more and try it for yourself!