{"version":"https://jsonfeed.org/version/1","title":"The Machine Learning Podcast","home_page_url":"https://www.themachinelearningpodcast.com","feed_url":"https://www.themachinelearningpodcast.com/json","description":"This show goes behind the scenes for the tools, techniques, and applications of machine learning. Model training, feature engineering, running in production, career development... Everything that you need to know to deliver real impact and value with machine learning and artificial intelligence.","_fireside":{"subtitle":"Detailed and technical explorations of machine learning and artificial intelligence with the researchers, engineers, and entrepreneurs who are shaping the industry","pubdate":"2024-03-03T10:00:00.000-05:00","explicit":false,"copyright":"2024 by Boundless Notions, LLC.","owner":"Tobias Macey","image":"https://assets.fireside.fm/file/fireside-images/podcasts/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/cover.jpg?v=2"},"items":[{"id":"f2db83e1-f565-4e25-aacb-0e50d463f055","title":"Strategies For Building A Product Using LLMs At DataChat","url":"https://www.themachinelearningpodcast.com/datachat-llm-product-business-episode-31","content_text":"Summary\n\nLarge Language Models (LLMs) have rapidly captured the attention of the world with their impressive capabilities. Unfortunately, they are often unpredictable and unreliable. This makes building a product based on their capabilities a unique challenge. Jignesh Patel is building DataChat to bring the capabilities of LLMs to organizational analytics, allowing anyone to have conversations with their business data. In this episode he shares the methods that he is using to build a product on top of this constantly shifting set of technologies.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Jignesh Patel about working with LLMs; understanding how they work and how to build your own\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you start by sharing some of the ways that you are working with LLMs currently?\nWhat are the business challenges involved in building a product on top of an LLM model that you don't own or control?\n\n\nIn the current age of business, your data is often your strategic advantage. How do you avoid losing control of, or leaking that data while interfacing with a hosted LLM API?\n\nWhat are the technical difficulties related to using an LLM as a core element of a product when they are largely a black box?\n\n\nWhat are some strategies for gaining visibility into the inner workings or decision making rules for these models?\n\nWhat are the factors, whether technical or organizational, that might motivate you to build your own LLM for a business or product?\n\n\nCan you unpack what it means to \"build your own\" when it comes to an LLM?\n\nIn your work at DataChat, how has the progression of sophistication in LLM technology impacted your own product strategy?\nWhat are the most interesting, innovative, or unexpected ways that you have seen LLMs/DataChat used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working with LLMs?\nWhen is an LLM the wrong choice?\nWhat do you have planned for the future of DataChat?\n\n\nContact Info\n\n\nWebsite\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nDataChat\nCMU == Carnegie Mellon University\nSVM == Support Vector Machine\nGenerative AI\nGenomics\nProteomics\nParquet\nOpenAI Codex\nLLama\nMistral\nGoogle Vertex\nLangchain\nRetrieval Augmented Generation\nPrompt Engineering\nEnsemble Learning\nXGBoost\nCatboost\nLinear Regression\nCOGS == Cost Of Goods Sold\nBruce Schneier - AI And Trust\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Large Language Models (LLMs) have rapidly captured the attention of the world with their impressive capabilities. Unfortunately, they are often unpredictable and unreliable. This makes building a product based on their capabilities a unique challenge. Jignesh Patel is building DataChat to bring the capabilities of LLMs to organizational analytics, allowing anyone to have conversations with their business data. In this episode he shares the methods that he is using to build a product on top of this constantly shifting set of technologies.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Large Language Models (LLMs) have rapidly captured the attention of the world with their impressive capabilities. Unfortunately, they are often unpredictable and unreliable. This makes building a product based on their capabilities a unique challenge. Jignesh Patel is building DataChat to bring the capabilities of LLMs to organizational analytics, allowing anyone to have conversations with their business data. In this episode he shares the methods that he is using to build a product on top of this constantly shifting set of technologies.","date_published":"2024-03-03T10:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/f2db83e1-f565-4e25-aacb-0e50d463f055.mp3","mime_type":"audio/mpeg","size_in_bytes":31485610,"duration_in_seconds":2920}]},{"id":"005ebb79-3acb-4b13-b61f-9eb477728504","title":"Improve The Success Rate Of Your Machine Learning Projects With bizML","url":"https://www.themachinelearningpodcast.com/bizml-machine-learning-business-process-episode-30","content_text":"Summary\n\nMachine learning is a powerful set of technologies, holding the potential to dramatically transform businesses across industries. Unfortunately, the implementation of ML projects often fail to achieve their intended goals. This failure is due to a lack of collaboration and investment across technological and organizational boundaries. To help improve the success rate of machine learning projects Eric Siegel developed the six step bizML framework, outlining the process to ensure that everyone understands the whole process of ML deployment. In this episode he shares the principles and promise of that framework and his motivation for encapsulating it in his book \"The AI Playbook\".\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Eric Siegel about how the bizML approach can help improve the success rate of your ML projects\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what bizML is and the story behind it?\n\n\nWhat are the key aspects of this approach that are different from the \"industry standard\" lifecycle of an ML project?\n\nWhat are the elements of your personal experience as an ML consultant that helped you develop the tenets of bizML?\nWho are the personas that need to be involved in an ML project to increase the likelihood of success?\n\n\nWho do you find to be best suited to \"own\" or \"lead\" the process?\n\nWhat are the organizational patterns that might hinder the work of delivering on the goals of an ML initiative?\nWhat are some of the misconceptions about the work involved in/capabilities of an ML model that you commonly encounter?\nWhat is your main goal in writing your book \"The AI Playbook\"?\nWhat are the most interesting, innovative, or unexpected ways that you have seen the bizML process in action?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on ML projects and developing the bizML framework?\nWhen is bizML the wrong choice?\nWhat are the future developments in organizational and technical approaches to ML that will improve the success rate of AI projects?\n\n\nContact Info\n\n\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nThe AI Playbook: Mastering the Rare Art of Machine Learning Deployment by Eric Siegel\nPredictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel\nColumbia University\nMachine Learning Week Conference\nGenerative AI World\nMachine Learning Leadership and Practice Course\nRexer Analytics\nKD Nuggets\nCRISP-DM\nRandom Forest\nGradient Descent\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Machine learning is a powerful set of technologies, holding the potential to dramatically transform businesses across industries. Unfortunately, the implementation of ML projects often fail to achieve their intended goals. This failure is due to a lack of collaboration and investment across technological and organizational boundaries. To help improve the success rate of machine learning projects Eric Siegel developed the six step bizML framework, outlining the process to ensure that everyone understands the whole process of ML deployment. In this episode he shares the principles and promise of that framework and his motivation for encapsulating it in his book "The AI Playbook".

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Machine learning is a powerful set of technologies, holding the potential to dramatically transform businesses across industries. Unfortunately, the implementation of ML projects often fail to achieve their intended goals. This failure is due to a lack of collaboration and investment across technological and organizational boundaries. To help improve the success rate of machine learning projects Eric Siegel developed the six step bizML framework, outlining the process to ensure that everyone understands the whole process of ML deployment. In this episode he shares the principles and promise of that framework and his motivation for encapsulating it in his book \"The AI Playbook\".","date_published":"2024-02-18T09:15:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/005ebb79-3acb-4b13-b61f-9eb477728504.mp3","mime_type":"audio/mpeg","size_in_bytes":35033450,"duration_in_seconds":3022}]},{"id":"ab8081e5-be14-4cef-b591-d644487f3702","title":"Using Generative AI To Accelerate Feature Engineering At FeatureByte","url":"https://www.themachinelearningpodcast.com/featurebyte-generative-ai-ml-pipelines-episode-29","content_text":"Summary\n\nOne of the most time consuming aspects of building a machine learning model is feature engineering. Generative AI offers the possibility of accelerating the discovery and creation of feature pipelines. In this episode Colin Priest explains how FeatureByte is applying generative AI models to the challenge of building and maintaining machine learning pipelines.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Colin Priest about applying generative AI to the task of building and deploying AI pipelines\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you start by giving the 30,000 foot view of the steps involved in an AI pipeline?\n\n\nUnderstand the problem\nFeature ideation\nFeature engineering\nExperiment\nOptimize\nProductionize\n\nWhat are the stages of that process that are prone to repetition?\n\n\nWhat are the ways that teams typically try to automate those steps?\n\nWhat are the features of generative AI models that can be brought to bear on the design stage of an AI pipeline?\n\n\nWhat are the validation/verification processes that engineers need to apply to the generated suggestions?\nWhat are the opportunities/limitations for unit/integration style tests?\n\nWhat are the elements of developer experience that need to be addressed to make the gen AI capabilities an enhancement instead of a distraction?\n\n\nWhat are the interfaces through which the AI functionality can/should be exposed?\n\nWhat are the aspects of pipeline and model deployment that can benefit from generative AI functionality?\n\n\nWhat are the potential risk factors that need to be considered when evaluating the application of this functionality?\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen generative AI used in the development and maintenance of AI pipelines?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on the application of generative AI to the ML workflow?\nWhen is generative AI the wrong choice?\nWhat do you have planned for the future of FeatureByte's AI copilot capabiliteis?\n\n\nContact Info\n\n\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nFeatureByte\nGenerative AI\nThe Art of War\nOCR == Optical Character Recognition\nGenetic Algorithm\nSemantic Layer\nPrompt Engineering\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

One of the most time consuming aspects of building a machine learning model is feature engineering. Generative AI offers the possibility of accelerating the discovery and creation of feature pipelines. In this episode Colin Priest explains how FeatureByte is applying generative AI models to the challenge of building and maintaining machine learning pipelines.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"One of the most time consuming aspects of building a machine learning model is feature engineering. Generative AI offers the possibility of accelerating the discovery and creation of feature pipelines. In this episode Colin Priest explains how FeatureByte is applying generative AI models to the challenge of building and maintaining machine learning pipelines.","date_published":"2024-02-11T17:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/ab8081e5-be14-4cef-b591-d644487f3702.mp3","mime_type":"audio/mpeg","size_in_bytes":23949188,"duration_in_seconds":2699}]},{"id":"d67fd8cb-e8e5-4360-81be-c89a9ee249d3","title":"Learn And Automate Critical Business Workflows With 8Flow","url":"https://www.themachinelearningpodcast.com/8flow-business-workflow-automation-episode-28","content_text":"Summary\n\nEvery business develops their own specific workflows to address their internal organizational needs. Not all of them are properly documented, or even visible. Workflow automation tools have tried to reduce the manual burden involved, but they are rigid and require substantial investment of time to discover and develop the routines. Boaz Hecht co-founded 8Flow to iteratively discover and automate pieces of workflows, bringing visibility and collaboration to the internal organizational processes that keep the business running.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Boaz Hecht about using AI to automate customer support at 8Flow\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what 8Flow is and the story behind it?\nHow does 8Flow compare to RPA tools that companies are using today?\n\n\nWhat are the opportunities for augmenting or integrating with RPA frameworks?\n\nWhat are the key selling points for the solution that you are building? (does AI sell? Or is it about the realized savings?)\nWhat are the sources of signal that you are relying on to build model features?\nGiven the heterogeneity in tools and processes across customers, what are the common focal points that let you address the widest possible range of functionality?\nCan you describe how 8Flow is implemented?\n\n\nHow have the design and goals evolved since you first started working on it?\n\nWhat are the model categories that are most relevant for process automation in your product?\nHow have you approached the design and implementation of your MLOps workflow? (model training, deployment, monitoring, versioning, etc.)\nWhat are the open questions around product focus and system design that you are still grappling with?\nGiven the relative recency of ML/AI as a profession and the massive growth in attention and activity, how are you addressing the challenge of obtaining and maximizing human talent?\nWhat are the most interesting, innovative, or unexpected ways that you have seen 8Flow used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on 8Flow?\nWhen is 8Flow the wrong choice?\nWhat do you have planned for the future of 8Flow?\n\n\nContact Info\n\n\nLinkedIn\nPersonal Website\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\n8Flow\nRobotic Process Automation\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Every business develops their own specific workflows to address their internal organizational needs. Not all of them are properly documented, or even visible. Workflow automation tools have tried to reduce the manual burden involved, but they are rigid and require substantial investment of time to discover and develop the routines. Boaz Hecht co-founded 8Flow to iteratively discover and automate pieces of workflows, bringing visibility and collaboration to the internal organizational processes that keep the business running.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Every business develops their own specific workflows to address their internal organizational needs. Not all of them are properly documented, or even visible. Workflow automation tools have tried to reduce the manual burden involved, but they are rigid and require substantial investment of time to discover and develop the routines. Boaz Hecht co-founded 8Flow to iteratively discover and automate pieces of workflows, bringing visibility and collaboration to the internal organizational that keep the business running.\r\n","date_published":"2024-01-28T18:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/d67fd8cb-e8e5-4360-81be-c89a9ee249d3.mp3","mime_type":"audio/mpeg","size_in_bytes":27083758,"duration_in_seconds":2582}]},{"id":"76cadfbb-f0c5-429c-808d-512de30c9ec4","title":"Considering The Ethical Responsibilities Of ML And AI Engineers","url":"https://www.themachinelearningpodcast.com/ml-ai-ethical-considerations-episode-27","content_text":"Summary\n\nMachine learning and AI applications hold the promise of drastically impacting every aspect of modern life. With that potential for profound change comes a responsibility for the creators of the technology to account for the ramifications of their work. In this episode Nicholas Cifuentes-Goodbody guides us through the minefields of social, technical, and ethical considerations that are necessary to ensure that this next generation of technical and economic systems are equitable and beneficial for the people that they impact.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Nicholas Cifuentes-Goodbody about the different elements of the machine learning workflow where ethics need to be considered\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nTo start with, who is responsible for addressing the ethical concerns around AI?\nWhat are the different ways that AI can have positive or negative outcomes from an ethical perspective?\n\n\nWhat is the role of practitioners/individual contributors in the identification and evaluation of ethical impacts of their work?\n\nWhat are some utilities that are helpful in identifying and addressing bias in training data?\nHow can practitioners address challenges of equity and accessibility in the delivery of AI products?\nWhat are some of the options for reducing the energy consumption for training and serving AI?\nWhat are the most interesting, innovative, or unexpected ways that you have seen ML teams incorporate ethics into their work?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on ethical implications of ML?\nWhat are some of the resources that you recommend for people who want to invest in their knowledge and application of ethics in the realm of ML?\n\n\nContact Info\n\n\nWorldQuant University's Applied Data Science Lab\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nUNESCO Recommendation on the Ethics of Artificial Intelligence\nEuropean Union AI Act\nHow machine learning helps advance access to human rights information\nDisinformation, Team Jorge\nChina, AI, and Human Rights\nHow China Is Using A.I. to Profile a Minority\nWeapons of Math Destruction\nFairlearn\nAI Fairness 360\nAllen Institute for AI NYT\nAllen Institute for AI\nTransformers\nAI4ALL\nWorldQuant University\nHow to Make Generative AI Greener\nMachine Learning Emissions Calculator\nPracticing Trustworthy Machine Learning\nEnergy and Policy Considerations for Deep Learning\nNatural Language Processing\nTrolley Problem\nProtected Classes\nfairlearn (scikit-learn)\nBERT Model\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Machine learning and AI applications hold the promise of drastically impacting every aspect of modern life. With that potential for profound change comes a responsibility for the creators of the technology to account for the ramifications of their work. In this episode Nicholas Cifuentes-Goodbody guides us through the minefields of social, technical, and ethical considerations that are necessary to ensure that this next generation of technical and economic systems are equitable and beneficial for the people that they impact.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Machine learning and AI applications hold the promise of drastically impacting every aspect of modern life. With that potential for profound change comes a responsibility for the creators of the technology to account for the ramifications of their work. In this episode Nicholas Cifuentes-Goodbody guides us through the minefields of social, technical, and ethical considerations that are necessary to ensure that this next generation of technical and economic systems are equitable and beneficial for the people that they impact.","date_published":"2024-01-28T14:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/76cadfbb-f0c5-429c-808d-512de30c9ec4.mp3","mime_type":"audio/mpeg","size_in_bytes":24531347,"duration_in_seconds":2366}]},{"id":"4b4bcf34-2eae-4fc2-9876-c16404db24af","title":"Build Intelligent Applications Faster With RelationalAI","url":"https://www.themachinelearningpodcast.com/relational-ai-data-coprocessor-episode-26","content_text":"Summary\n\nBuilding machine learning systems and other intelligent applications are a complex undertaking. This often requires retrieving data from a warehouse engine, adding an extra barrier to every workflow. The RelationalAI engine was built as a co-processor for your data warehouse that adds a greater degree of flexibility in the representation and analysis of the underlying information, simplifying the work involved. In this episode CEO Molham Aref explains how RelationalAI is designed, the capabilities that it adds to your data clouds, and how you can start using it to build more sophisticated applications on your data.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Molham Aref about RelationalAI and the principles behind it for powering intelligent applications\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what RelationalAI is and the story behind it?\n\n\nOn your site you call your product an \"AI Co-processor\". Can you explain what you mean by that phrase?\n\nWhat are the primary use cases that you address with the RelationalAI product?\n\n\nWhat are the types of solutions that teams might build to address those problems in the absence of something like the RelationalAI engine?\n\nCan you describe the system design of RelationalAI?\n\n\nHow have the design and goals of the platform changed since you first started working on it?\n\nFor someone who is using RelationalAI to address a business need, what does the onboarding and implementation workflow look like?\nWhat is your design philosophy for identifying the balance between automating the implementation of certain categories of application (e.g. NER) vs. providing building blocks and letting teams assemble them on their own?\nWhat are the data modeling paradigms that teams should be aware of to make the best use of the RKGS platform and Rel language?\nWhat are the aspects of customer education that you find yourself spending the most time on?\nWhat are some of the most under-utilized or misunderstood capabilities of the RelationalAI platform that you think deserve more attention?\nWhat are the most interesting, innovative, or unexpected ways that you have seen the RelationalAI product used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on RelationalAI?\nWhen is RelationalAI the wrong choice?\nWhat do you have planned for the future of RelationalAI?\n\n\nContact Info\n\n\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nRelationalAI\nSnowflake\nAI Winter\nBigQuery\nGradient Descent\nB-Tree\nNavigational Database\nHadoop\nTeradata\nWorst Case Optimal Join\nSemantic Query Optimization\nRelational Algebra\nHyperGraph\nLinear Algebra\nVector Database\nPathway\n\n\nData Engineering Podcast Episode\n\nPinecone\n\n\nData Engineering Podcast Episode\n\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Building machine learning systems and other intelligent applications are a complex undertaking. This often requires retrieving data from a warehouse engine, adding an extra barrier to every workflow. The RelationalAI engine was built as a co-processor for your data warehouse that adds a greater degree of flexibility in the representation and analysis of the underlying information, simplifying the work involved. In this episode CEO Molham Aref explains how RelationalAI is designed, the capabilities that it adds to your data clouds, and how you can start using it to build more sophisticated applications on your data.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Building machine learning systems and other intelligent applications are a complex undertaking. This often requires retrieving data from a warehouse engine, adding an extra barrier to every workflow. The RelationalAI engine was built as a co-processor for your data warehouse that adds a greater degree of flexibility in the representation and analysis of the underlying information, simplifying the work involved. In this episode CEO Molham Aref explains how RelationalAI is designed, the capabilities that it adds to your data clouds, and how you can start using it to build more sophisticated applications on your data.","date_published":"2023-12-30T22:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/4b4bcf34-2eae-4fc2-9876-c16404db24af.mp3","mime_type":"audio/mpeg","size_in_bytes":33171306,"duration_in_seconds":3504}]},{"id":"dcafedc8-60b7-4382-abd9-1737054b76bd","title":"Building Better AI While Preserving User Privacy With TripleBlind","url":"https://www.themachinelearningpodcast.com/tripleblind-ai-user-privacy-episode-25","content_text":"Summary\n\nMachine learning and generative AI systems have produced truly impressive capabilities. Unfortunately, many of these applications are not designed with the privacy of end-users in mind. TripleBlind is a platform focused on embedding privacy preserving techniques in the machine learning process to produce more user-friendly AI products. In this episode Gharib Gharibi explains how the current generation of applications can be susceptible to leaking user data and how to counteract those trends.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Gharib Gharibi about the challenges of bias and data privacy in generative AI models\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nGenerative AI has been gaining a lot of attention and speculation about its impact. What are some of the risks that these capabilities pose?\n\n\nWhat are the main contributing factors to their existing shortcomings?\nWhat are some of the subtle ways that bias in the source data can manifest?\n\nIn addition to inaccurate results, there is also a question of how user interactions might be re-purposed and potential impacts on data and personal privacy. What are the main sources of risk?\nWith the massive attention that generative AI has created and the perspectives that are being shaped by it, how do you see that impacting the general perception of other implementations of AI/ML?\n\n\nHow can ML practitioners improve and convey the trustworthiness of their models to end users?\nWhat are the risks for the industry if generative models fall out of favor with the public?\n\nHow does your work at Tripleblind help to encourage a conscientious approach to AI?\nWhat are the most interesting, innovative, or unexpected ways that you have seen data privacy addressed in AI applications?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on privacy in AI?\nWhen is TripleBlind the wrong choice?\nWhat do you have planned for the future of TripleBlind?\n\n\nContact Info\n\n\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nTripleBlind\nImageNet Geoffrey Hinton Paper\nBERT language model\nGenerative AI\nGPT == Generative Pre-trained Transformer\nHIPAA Safe Harbor Rules\nFederated Learning\nDifferential Privacy\nHomomorphic Encryption\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Machine learning and generative AI systems have produced truly impressive capabilities. Unfortunately, many of these applications are not designed with the privacy of end-users in mind. TripleBlind is a platform focused on embedding privacy preserving techniques in the machine learning process to produce more user-friendly AI products. In this episode Gharib Gharibi explains how the current generation of applications can be susceptible to leaking user data and how to counteract those trends.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Machine learning and generative AI systems have produced truly impressive capabilities. Unfortunately, many of these applications are not designed with the privacy of end-users in mind. TripleBlind is a platform focused on embedding privacy preserving techniques in the machine learning process to produce more user-friendly AI products. In this episode Gharib Gharibi explains how the current generation of applications can be susceptible to leaking user data and how to counteract those trends.","date_published":"2023-11-21T20:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/dcafedc8-60b7-4382-abd9-1737054b76bd.mp3","mime_type":"audio/mpeg","size_in_bytes":31189816,"duration_in_seconds":2814}]},{"id":"2c55baa9-9509-4cff-99ff-7fc4cd7b76e8","title":"Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine","url":"https://www.themachinelearningpodcast.com/tabnine-generative-ai-developer-assistant-episode-24","content_text":"Summary\n\nSoftware development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Eran Yahav about building an AI powered developer assistant at Tabnine\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Tabnine is and the story behind it?\nWhat are the individual and organizational motivations for using AI to generate code?\n\n\nWhat are the real-world limitations of generative AI for creating software? (e.g. size/complexity of the outputs, naming conventions, etc.)\nWhat are the elements of skepticism/oversight that developers need to exercise while using a system like Tabnine?\n\nWhat are some of the primary ways that developers interact with Tabnine during their development workflow?\n\n\nAre there any particular styles of software for which an AI is more appropriate/capable? (e.g. webapps vs. data pipelines vs. exploratory analysis, etc.)\n\nFor natural languages there is a strong bias toward English in the current generation of LLMs. How does that translate into computer languages? (e.g. Python, Java, C++, etc.)\nCan you describe the structure and implementation of Tabnine?\n\n\nDo you rely primarily on a single core model, or do you have multiple models with subspecialization?\nHow have the design and goals of the product changed since you first started working on it?\n\nWhat are the biggest challenges in building a custom LLM for code?\n\n\nWhat are the opportunities for specialization of the model architecture given the highly structured nature of the problem domain?\n\nFor users of Tabnine, how do you assess/monitor the accuracy of recommendations?\n\n\nWhat are the feedback and reinforcement mechanisms for the model(s)?\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen Tabnine's LLM powered coding assistant used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on AI assisted development at Tabnine?\nWhen is an AI developer assistant the wrong choice?\nWhat do you have planned for the future of Tabnine?\n\n\nContact Info\n\n\nLinkedIn\nWebsite\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nTabNine\nTechnion University\nProgram Synthesis\nContext Stuffing\nElixir\nDependency Injection\nCOBOL\nVerilog\nMidJourney\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.","date_published":"2023-11-12T21:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/2c55baa9-9509-4cff-99ff-7fc4cd7b76e8.mp3","mime_type":"audio/mpeg","size_in_bytes":30178884,"duration_in_seconds":3887}]},{"id":"dd60b69a-2115-4c3c-b00f-b470011523fe","title":"Validating Machine Learning Systems For Safety Critical Applications With Ketryx","url":"https://www.themachinelearningpodcast.com/ketryx-safety-critical-machine-learning-systems-episode-23","content_text":"Summary\n\nSoftware systems power much of the modern world. For applications that impact the safety and well-being of people there is an extra set of precautions that need to be addressed before deploying to production. If machine learning and AI are part of that application then there is a greater need to validate the proper functionality of the models. In this episode Erez Kaminski shares the work that he is doing at Ketryx to make that validation easier to implement and incorporate into the ongoing maintenance of software and machine learning products.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Erez Kaminski about using machine learning in safety critical and highly regulated medical applications\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you start by describing some of the regulatory burdens placed on ML teams who are building solutions for medical applications?\n\n\nHow do these requirements impact the development and validation processes of model design and development?\n\nWhat are some examples of the procedural and record-keeping aspects of the machine learning workflow that are required for FDA compliance?\n\n\nWhat are the opportunities for automating pieces of that overhead?\n\nCan you describe what you are doing at Ketryx to streamline the development/training/deployment of ML/AI applications for medical use cases?\n\n\nWhat are the ideas/assumptions that you had at the start of Ketryx that have been challenged/updated as you work with customers?\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen ML used in medical applications?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Ketryx?\nWhen is Ketryx the wrong choice?\nWhat do you have planned for the future of Ketryx?\n\n\nContact Info\n\n\nEmail\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers.\n\n\nLinks\n\n\nKetryx\nWolfram Alpha\nMathematica\nTensorflow\nSBOM == Software Bill Of Materials\nAir-gapped Systems\nAlexNet\nShapley Values\nSHAP\n\n\nPodcast.__init__ Episode\n\nBayesian Statistics\nCausal Modeling\nProphet\nFDA Principles Of Software Validation\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Software systems power much of the modern world. For applications that impact the safety and well-being of people there is an extra set of precautions that need to be addressed before deploying to production. If machine learning and AI are part of that application then there is a greater need to validate the proper functionality of the models. In this episode Erez Kaminski shares the work that he is doing at Ketryx to make that validation easier to implement and incorporate into the ongoing maintenance of software and machine learning products.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Software systems power much of the modern world. For applications that impact the safety and well-being of people there is an extra set of precautions that need to be addressed before deploying to production. If machine learning and AI are part of that application then there is a greater need to validate the proper functionality of the models. In this episode Erez Kaminski shares the work that he is doing at Ketryx to make that validation easier to implement and incorporate into the ongoing maintenance of software and machine learning products.","date_published":"2023-11-07T21:15:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/dd60b69a-2115-4c3c-b00f-b470011523fe.mp3","mime_type":"audio/mpeg","size_in_bytes":28562867,"duration_in_seconds":3072}]},{"id":"0a46e74a-ba2f-4163-b217-bfa5d06c670d","title":"Applying Declarative ML Techniques To Large Language Models For Better Results","url":"https://www.themachinelearningpodcast.com/predibase-declarative-ml-large-language-models-episode-22","content_text":"Summary\n\nLarge language models have gained a substantial amount of attention in the area of AI and machine learning. While they are impressive, there are many applications where they are not the best option. In this episode Piero Molino explains how declarative ML approaches allow you to make the best use of the available tools across use cases and data formats.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Piero Molino about the application of declarative ML in a world being dominated by large language models\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you start by summarizing your perspective on the effect that LLMs are having on the AI/ML industry?\n\n\nIn a world where LLMs are being applied to a growing variety of use cases, what are the capabilities that they still lack?\nHow does declarative ML help to address those shortcomings?\n\nThe majority of current hype is about commercial models (e.g. GPT-4). Can you summarize the current state of the ecosystem for open source LLMs?\n\n\nFor teams who are investing in ML/AI capabilities, what are the sources of platform risk for LLMs?\nWhat are the comparative benefits of using a declarative ML approach?\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen LLMs used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on declarative ML in the age of LLMs?\nWhen is an LLM the wrong choice?\nWhat do you have planned for the future of declarative ML and Predibase?\n\n\nContact Info\n\n\nLinkedIn\nWebsite\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nLinks\n\n\nPredibase\n\n\nPodcast Episode\n\nLudwig\n\n\nPodcast.__init__ Episode\n\nRecommender Systems\nInformation Retrieval\nVector Database\nTransformer Model\nBERT\nContext Windows\nLLAMA\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Large language models have gained a substantial amount of attention in the area of AI and machine learning. While they are impressive, there are many applications where they are not the best option. In this episode Piero Molino explains how declarative ML approaches allow you to make the best use of the available tools across use cases and data formats.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Closing Announcements

\n\n\n\n

Parting Question

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Large language models have gained a substantial amount of attention in the area of AI and machine learning. While they are impressive, there are many applications where they are not the best option. In this episode Piero Molino explains how declarative ML approaches allow you to make the best use of the available tools across use cases and data formats.","date_published":"2023-10-24T19:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/0a46e74a-ba2f-4163-b217-bfa5d06c670d.mp3","mime_type":"audio/mpeg","size_in_bytes":28028788,"duration_in_seconds":2771}]},{"id":"a0bc589f-21ef-4f72-b094-248c122ae367","title":"Surveying The Landscape Of AI and ML From An Investor's Perspective","url":"https://www.themachinelearningpodcast.com/mad-landscape-2023-ml-ai-episode-21","content_text":"Summary\n\nArtificial Intelligence is experiencing a renaissance in the wake of breakthrough natural language models. With new businesses sprouting up to address the various needs of ML and AI teams across the industry, it is a constant challenge to stay informed. Matt Turck has been compiling a report on the state of ML, AI, and Data for his work at FirstMark Capital. In this episode he shares his findings on the ML and AI landscape and the interesting trends that are developing.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nAs more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES.\nYour host is Tobias Macey and today I'm interviewing Matt Turck about his work on the MAD (ML, AI, and Data) landscape and the insights he has gained on the ML ecosystem\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what the MAD landscape project is and the story behind it?\nWhat are the major changes in the ML ecosystem that you have seen since you first started compiling the landscape?\n\n\nHow have the developments in consumer-grade AI in recent years changed the business opportunities for ML/AI?\n\nWhat are the coarse divisions that you see as the boundaries that define the different categories for ML/AI in the landscape?\nFor ML infrastructure products/companies, what are the biggest challenges that they face in engineering and customer acquisition?\nWhat are some of the challenges in building momentum for startups in AI (existing moats around data access, talent acquisition, etc.)?\n\n\nFor products/companies that have ML/AI as their core offering, what are some strategies that they use to compete with \"big tech\" companies that already have a large corpus of data?\n\nWhat do you see as the societal vs. business importance of open source models as AI becomes more integrated into consumer facing products?\nWhat are the most interesting, innovative, or unexpected ways that you have seen ML/AI used in business and social contexts?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on the ML/AI elements of the MAD landscape?\nWhen is ML/AI the wrong choice for businesses?\nWhat are the areas of ML/AI that you are paying closest attention to in your own work?\n\n\nContact Info\n\n\nWebsite\n@mattturck on Twitter\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nMAD Landscape\n\n\nData Engineering Podcast Episode\n\nFirst Mark Capital\nBayesian Techniques\nHadoop\nChatGPT\nAutoGPT\nDataiku\nGenerative AI\nDatabricks\nMLOps\nOpenAI\nAnthropic\nDeepMind\nBloombergGPT\nHuggingFace\nJexi Movie\n\"Her\" Movie\nSynthesia\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Artificial Intelligence is experiencing a renaissance in the wake of breakthrough natural language models. With new businesses sprouting up to address the various needs of ML and AI teams across the industry, it is a constant challenge to stay informed. Matt Turck has been compiling a report on the state of ML, AI, and Data for his work at FirstMark Capital. In this episode he shares his findings on the ML and AI landscape and the interesting trends that are developing.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Artificial Intelligence is experiencing a renaissance in the wake of breakthrough natural language models. With new businesses sprouting up to address the various needs of ML and AI teams across the industry, it is a constant challenge to stay informed. Matt Turck has been compiling a report on the state of ML, AI, and Data for his work at FirstMark Capital. In this episode he shares his findings on the ML and AI landscape and the interesting trends that are developing.","date_published":"2023-10-15T13:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/a0bc589f-21ef-4f72-b094-248c122ae367.mp3","mime_type":"audio/mpeg","size_in_bytes":30421002,"duration_in_seconds":3754}]},{"id":"57f350c9-4d8f-47be-8958-b5f96de28c21","title":"Applying Federated Machine Learning To Sensitive Healthcare Data At Rhino Health","url":"https://www.themachinelearningpodcast.com/rhino-health-federated-machine-learning-episode-20","content_text":"Summary\n\nA core challenge of machine learning systems is getting access to quality data. This often means centralizing information in a single system, but that is impractical in highly regulated industries, such as healthchare. To address this hurdle Rhino Health is building a platform for federated learning on health data, so that everyone can maintain data privacy while benefiting from AI capabilities. In this episode Ittai Dayan explains the barriers to ML in healthcare and how they have designed the Rhino platform to overcome them.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Ittai Dayan about using federated learning at Rhino Health to bring AI capabilities to the tightly regulated healthcare industry\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Rhino Health is and the story behind it?\nWhat is federated learning and what are the trade-offs that it introduces?\n\n\nWhat are the benefits to healthcare and pharmalogical organizations from using federated learning?\n\nWhat are some of the challenges that you face in validating that patient data is properly de-identified in the federated models?\nCan you describe what the Rhino Health platform offers and how it is implemented?\n\n\nHow have the design and goals of the system changed since you started working on it?\n\nWhat are the technological capabilities that are needed for an organization to be able to start using Rhino Health to gain insights into their patient and clinical data?\n\n\nHow have you approached the design of your product to reduce the effort to onboard new customers and solutions?\n\nWhat are some examples of the types of automation that you are able to provide to your customers? (e.g. medical diagnosis, radiology review, health outcome predictions, etc.)\nWhat are the ethical and regulatory challenges that you have had to address in the development of your platform?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Rhino Health used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Rhino Health?\nWhen is Rhino Health the wrong choice?\nWhat do you have planned for the future of Rhino Health?\n\n\nContact Info\n\n\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nRhino Health\nFederated Learning\nNvidia Clara\nNvidia DGX\nMelloddy\nFlair NLP\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

A core challenge of machine learning systems is getting access to quality data. This often means centralizing information in a single system, but that is impractical in highly regulated industries, such as healthchare. To address this hurdle Rhino Health is building a platform for federated learning on health data, so that everyone can maintain data privacy while benefiting from AI capabilities. In this episode Ittai Dayan explains the barriers to ML in healthcare and how they have designed the Rhino platform to overcome them.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"A core challenge of machine learning systems is getting access to quality data. This often means centralizing information in a single system, but that is impractical in highly regulated industries, such as healthchare. To address this hurdle Rhino Health is building a platform for federated learning on health data, so that everyone can maintain data privacy while benefiting from AI capabilities. In this episode Ittai Dayan explains the barriers to ML in healthcare and how they have designed the Rhino platform to overcome them.","date_published":"2023-09-10T21:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/57f350c9-4d8f-47be-8958-b5f96de28c21.mp3","mime_type":"audio/mpeg","size_in_bytes":32457804,"duration_in_seconds":2994}]},{"id":"8c85d226-2b6c-49f5-a849-a2d99c16896f","title":"Using Machine Learning To Keep An Eye On The Planet","url":"https://www.themachinelearningpodcast.com/iceye-ml-on-synthetic-aperture-radar-episode-19","content_text":"Summary\n\nSatellite imagery has given us a new perspective on our world, but it is limited by the field of view for the cameras. Synthetic Aperture Radar (SAR) allows for collecting images through clouds and in the dark, giving us a more consistent means of collecting data. In order to identify interesting details in such a vast amount of data it is necessary to use the power of machine learning. ICEYE has a fleet of satellites continuously collecting information about our planet. In this episode Tapio Friberg shares how they are applying ML to that data set to provide useful insights about fires, floods, and other terrestrial phenomena.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Tapio Friberg about building machine learning applications on top of SAR (Synthetic Aperture Radar) data to generate insights about our planet\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what ICEYE is and the story behind it?\nWhat are some of the applications of ML at ICEYE?\nWhat are some of the ways that SAR data poses a unique challenge to ML applications?\nWhat are some of the elements of the ML workflow that you are able to use \"off the shelf\" and where are the areas that you have had to build custom solutions?\nCan you share the structure of your engineering team and the role that the ML function plays in the larger organization?\nWhat does the end-to-end workflow for your ML model development and deployment look like?\n\n\nWhat are the operational requirements for your models? (e.g. batch execution, real-time, interactive inference, etc.)\n\nIn the model definitions, what are the elements of the source domain that create the largest challenges? (e.g. noise from backscatter, variance in resolution, etc.)\nOnce you have an output from an ML model how do you manage mapping between data domains to reflect insights from SAR sources onto a human understandable representation?\nGiven that SAR data and earth imaging is still a very niche domain, how does that influence your ability to hire for open positions and the ways that you think about your contributions to the overall ML ecosystem?\nHow can your work on using SAR as a representation of physical attributes help to improve capabilities in e.g. LIDAR, computer vision, etc.?\nWhat are the most interesting, innovative, or unexpected ways that you have seen ICEYE and SAR data used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on ML for SAR data?\nWhat do you have planned for the future of ML applications at ICEYE?\n\n\nContact Info\n\n\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nICEYE\nSAR == Synthetic Aperture Radar\nTransfer Learning\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Satellite imagery has given us a new perspective on our world, but it is limited by the field of view for the cameras. Synthetic Aperture Radar (SAR) allows for collecting images through clouds and in the dark, giving us a more consistent means of collecting data. In order to identify interesting details in such a vast amount of data it is necessary to use the power of machine learning. ICEYE has a fleet of satellites continuously collecting information about our planet. In this episode Tapio Friberg shares how they are applying ML to that data set to provide useful insights about fires, floods, and other terrestrial phenomena.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Satellite imagery has given us a new perspective on our world, but it is limited by the field of view for the cameras. Synthetic Aperture Radar (SAR) allows for collecting images through clouds and in the dark, giving us a more consistent means of collecting data. In order to identify interesting details in such a vast amount of data it is necessary to use the power of machine learning. ICEYE has a fleet of satellites continuously collecting information about our planet. In this episode Tapio Friberg shares how they are applying ML to that data set to provide useful insights about fires, floods, and other terrestrial phenomena.","date_published":"2023-06-17T10:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/8c85d226-2b6c-49f5-a849-a2d99c16896f.mp3","mime_type":"audio/mpeg","size_in_bytes":34737954,"duration_in_seconds":2552}]},{"id":"87095652-f90b-4ea1-9534-644010cbd32c","title":"The Role Of Model Development In Machine Learning Systems","url":"https://www.themachinelearningpodcast.com/gantry-ml-model-development-episode-18","content_text":"Summary\n\nThe focus of machine learning projects has long been the model that is built in the process. As AI powered applications grow in popularity and power, the model is just the beginning. In this episode Josh Tobin shares his experience from his time as a machine learning researcher up to his current work as a founder at Gantry, and the shift in focus from model development to machine learning systems.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Josh Tobin about the state of industry best practices for designing and building ML models\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you start by describing what a \"traditional\" process for building a model looks like?\n\n\nWhat are the forces that shaped those \"best practices\"?\n\nWhat are some of the practices that are still necessary/useful and what is becoming outdated? \n\n\nWhat are the changes in the ecosystem (tooling, research, communal knowledge, etc.) that are forcing teams to reconsider how they think about modeling?\n\nWhat are the most critical practices/capabilities for teams who are building services powered by ML/AI?\n\n\nWhat systems do they need to support them in those efforts?\n\nCan you describe what you are building at Gantry and how it aids in the process of developing/deploying/maintaining models with \"modern\" workflows?\nWhat are the most challenging aspects of building a platform that supports ML teams in their workflows?\nWhat are the most interesting, innovative, or unexpected ways that you have seen teams approach model development/validation?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Gantry?\nWhen is Gantry the wrong choice?\nWhat are some of the resources that you find most helpful to stay apprised of how modeling and ML practices are evolving?\n\n\nContact Info\n\n\nLinkedIn\nWebsite\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nGantry\nFull Stack Deep Learning\nOpenAI\nKaggle\nNeurIPS == Neural Information Processing Systems Conference\nCaffe\nTheano\nDeep Learning\nRegression Model\nscikit-learn\nLarge Language Model\nFoundation Models\nCohere\nFederated Learning\nFeature Store\ndbt\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

The focus of machine learning projects has long been the model that is built in the process. As AI powered applications grow in popularity and power, the model is just the beginning. In this episode Josh Tobin shares his experience from his time as a machine learning researcher up to his current work as a founder at Gantry, and the shift in focus from model development to machine learning systems.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"The focus of machine learning projects has long been the model that is built in the process. As AI powered applications grow in popularity and power, the model is just the beginning. In this episode Josh Tobin shares his experience from his time as a machine learning researcher up to his current work as a founder at Gantry, and the shift in focus from model development to machine learning systems.","date_published":"2023-05-28T21:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/87095652-f90b-4ea1-9534-644010cbd32c.mp3","mime_type":"audio/mpeg","size_in_bytes":33457163,"duration_in_seconds":2801}]},{"id":"4fdaba79-a772-428b-a47c-e35a32a77606","title":"Real-Time Machine Learning Has Entered The Realm Of The Possible","url":"https://www.themachinelearningpodcast.com/tecton-real-time-machine-learning-episode-17","content_text":"Summary\n\nMachine learning models have predominantly been built and updated in a batch modality. While this is operationally simpler, it doesn't always provide the best experience or capabilities for end users of the model. Tecton has been investing in the infrastructure and workflows that enable building and updating ML models with real-time data to allow you to react to real-world events as they happen. In this episode CTO Kevin Stumpf explores they benefits of real-time machine learning and the systems that are necessary to support the development and maintenance of those models.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Kevin Stumpf about the challenges and promise of real-time ML applications\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what real-time ML is and some examples of where it might be applied?\nWhat are the operational and organizational requirements for being able to adopt real-time approaches for ML projects?\nWhat are some of the ways that real-time requirements influence the scale/scope/architecture of an ML model?\nWhat are some of the failure modes for real-time vs analytical or operational ML?\nGiven the low latency between source/input data being generated or received and a prediction being generated, how does that influence susceptibility to e.g. data drift?\n\n\nData quality and accuracy also become more critical. What are some of the validation strategies that teams need to consider as they move to real-time?\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen real-time ML applied?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on real-time ML systems?\nWhen is real-time the wrong choice for ML?\nWhat do you have planned for the future of real-time support for ML in Tecton?\n\n\nContact Info\n\n\nLinkedIn\n@kevinmstumpf on Twitter\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nTecton\n\n\nPodcast Episode\nData Engineering Podcast Episode\n\nUber Michelangelo\nReinforcement Learning\nOnline Learning\nRandom Forest\nChatGPT\nXGBoost\nLinear Regression\nTrain-Serve Skew\nFlink\n\n\nData Engineering Podcast Episode\n\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0Sponsored By:Data Council: ![Data Council Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/Bz3JJvtU.png)\r\nJoin us at the event for the global data community, Data Council Austin. From March 28-30th 2023, we'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount off tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit: [themachinelearningpodcast.com/data-council](https://www.themachinelearningpodcast.com/data-council) Promo Code: dataengpod20","content_html":"

Summary

\n\n

Machine learning models have predominantly been built and updated in a batch modality. While this is operationally simpler, it doesn't always provide the best experience or capabilities for end users of the model. Tecton has been investing in the infrastructure and workflows that enable building and updating ML models with real-time data to allow you to react to real-world events as they happen. In this episode CTO Kevin Stumpf explores they benefits of real-time machine learning and the systems that are necessary to support the development and maintenance of those models.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

Sponsored By:

","summary":"Machine learning models have predominantly been built and updated in a batch modality. While this is operationally simpler, it doesn't always provide the best experience or capabilities for end users of the model. Tecton has been investing in the infrastructure and workflows that enable building and updating ML models with real-time data to allow you to react to real-world events as they happen. In this episode CTO Kevin Stumpf explores they benefits of real-time machine learning and the systems that are necessary to support the development and maintenance of those models.","date_published":"2023-03-09T17:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/4fdaba79-a772-428b-a47c-e35a32a77606.mp3","mime_type":"audio/mpeg","size_in_bytes":18363865,"duration_in_seconds":2069}]},{"id":"bb5034b4-ddaf-4f4f-b682-2bbfdeb18bd7","title":"How Shopify Built A Machine Learning Platform That Encourages Experimentation","url":"https://www.themachinelearningpodcast.com/shopify-merlin-ml-platform-episode-16","content_text":"Summary\n\nShopify uses machine learning to power multiple features in their platform. In order to reduce the amount of effort required to develop and deploy models they have invested in building an opinionated platform for their engineers. They have gone through multiple iterations of the platform and their most recent version is called Merlin. In this episode Isaac Vidas shares the use cases that they are optimizing for, how it integrates into the rest of their data platform, and how they have designed it to let machine learning engineers experiment freely and safely.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Isaac Vidas about his work on the ML platform used by Shopify\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Shopify is and some of the ways that you are using ML at Shopify?\n\n\nWhat are the challenges that you have encountered as an organization in applying ML to your business needs?\n\nCan you describe how you have designed your current technical platform for supporting ML workloads?\n\n\nWho are the target personas for this platform?\nWhat does the workflow look like for a given data scientist/ML engineer/etc.?\n\nWhat are the capabilities that you are trying to optimize for in your current platform?\n\n\nWhat are some of the previous iterations of ML infrastructure and process that you have built?\nWhat are the most useful lessons that you gathered from those previous experiences that informed your current approach?\n\nHow have the capabilities of the Merlin platform influenced the ways that ML is viewed and applied across Shopify?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Merlin used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Merlin?\nWhen is Merlin the wrong choice?\nWhat do you have planned for the future of Merlin?\n\n\nContact Info\n\n\n@kazuaros on Twitter\nLinkedIn\nkazuar on GitHub\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nShopify\nShopify Merlin\nVertex AI\nscikit-learn\nXGBoost\nRay\n\n\nPodcast.__init__ Episode\n\nPySpark\nGPT-3\nChatGPT\nGoogle AI\nPyTorch\n\n\nPodcast.__init__ Episode\n\nDask\nModin\n\n\nPodcast.__init__ Episode\n\nFlink\n\n\nData Engineering Podcast Episode\n\nFeast Feature Store\nKubernetes\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Shopify uses machine learning to power multiple features in their platform. In order to reduce the amount of effort required to develop and deploy models they have invested in building an opinionated platform for their engineers. They have gone through multiple iterations of the platform and their most recent version is called Merlin. In this episode Isaac Vidas shares the use cases that they are optimizing for, how it integrates into the rest of their data platform, and how they have designed it to let machine learning engineers experiment freely and safely.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Shopify uses machine learning to power multiple features in their platform. In order to reduce the amount of effort required to develop and deploy models they have invested in building an opinionated platform for their engineers. They have gone through multiple iterations of the platform and their most recent version is called Merlin. In this episode Isaac Vidas shares the use cases that they are optimizing for, how it integrates into the rest of their data platform, and how they have designed it to let machine learning engineers experiment freely and safely.","date_published":"2023-02-02T10:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bb5034b4-ddaf-4f4f-b682-2bbfdeb18bd7.mp3","mime_type":"audio/mpeg","size_in_bytes":42522235,"duration_in_seconds":3971}]},{"id":"bea02319-c666-436b-9fe1-11ff1d4ed6ec","title":"Applying Machine Learning To The Problem Of Bad Data At Anomalo","url":"https://www.themachinelearningpodcast.com/anomalo-data-quality-monitoring-episode-15","content_text":"Summary\n\nAll data systems are subject to the \"garbage in, garbage out\" problem. For machine learning applications bad data can lead to unreliable models and unpredictable results. Anomalo is a product designed to alert on bad data by applying machine learning models to various storage and processing systems. In this episode Jeremy Stanley discusses the various challenges that are involved in building useful and reliable machine learning models with unreliable data and the interesting problems that they are solving in the process.\n\nAnnouncements\n\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nYour host is Tobias Macey and today I'm interviewing Jeremy Stanley about his work at Anomalo, applying ML to the problem of data quality monitoring\n\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Anomalo is and the story behind it?\nWhat are some of the ML approaches that you are using to address challenges with data quality/observability?\nWhat are some of the difficulties posed by your application of ML technologies on data sets that you don't control?\n\n\nHow does the scale and quality of data that you are working with influence/constrain the algorithmic approaches that you are using to build and train your models?\n\nHow have you implemented the infrastructure and workflows that you are using to support your ML applications?\nWhat are some of the ways that you are addressing data quality challenges in your own platform?\n\n\nWhat are the opportunities that you have for dogfooding your product?\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen Anomalo used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Anomalo?\nWhen is Anomalo the wrong choice?\nWhat do you have planned for the future of Anomalo?\n\n\nContact Info\n\n\n@jeremystan on Twitter\nLinkedIn\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nAnomalo\n\n\nData Engineering Podcast Episode\n\nPartial Differential Equations\nNeural Network\nNeural Networks For Pattern Recognition by Christopher M. Bishop (affiliate link)\nGradient Boosted Decision Trees\nShapley Values\nSentry\ndbt\nAltair\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

All data systems are subject to the "garbage in, garbage out" problem. For machine learning applications bad data can lead to unreliable models and unpredictable results. Anomalo is a product designed to alert on bad data by applying machine learning models to various storage and processing systems. In this episode Jeremy Stanley discusses the various challenges that are involved in building useful and reliable machine learning models with unreliable data and the interesting problems that they are solving in the process.

\n\n

Announcements

\n\n\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"All data systems are subject to the \"garbage in, garbage out\" problem. For machine learning applications bad data can lead to unreliable models and unpredictable results. Anomalo is a product designed to alert on bad data by applying machine learning models to various storage and processing systems. In this episode Jeremy Stanley discusses the various challenges that are involved in building useful and reliable machine learning models with unreliable data and the interesting problems that they are solving in the process.","date_published":"2023-01-23T21:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bea02319-c666-436b-9fe1-11ff1d4ed6ec.mp3","mime_type":"audio/mpeg","size_in_bytes":30000554,"duration_in_seconds":3564}]},{"id":"a5f61e41-1d38-4835-b3be-469cd4aff668","title":"Build More Reliable Machine Learning Systems With The Dagster Orchestration Engine","url":"https://www.themachinelearningpodcast.com/dagster-ml-orchestration-episode-14","content_text":"Summary\n\nBuilding a machine learning model one time can be done in an ad-hoc manner, but if you ever want to update it and serve it in production you need a way of repeating a complex sequence of operations. Dagster is an orchestration engine that understands the data that it is manipulating so that you can move beyond coarse task-based representations of your dependencies. In this episode Sandy Ryza explains how his background in machine learning has informed his work on the Dagster project and the foundational principles that it is built on to allow for collaboration across data engineering and machine learning concerns.\n\nInterview\n\n\nIntroduction\nHow did you get involved in machine learning?\nCan you start by sharing a definition of \"orchestration\" in the context of machine learning projects?\nWhat is your assessment of the state of the orchestration ecosystem as it pertains to ML?\nmodeling cycles and managing experiment iterations in the execution graph\nhow to balance flexibility with repeatability \nWhat are the most interesting, innovative, or unexpected ways that you have seen orchestration implemented/applied for machine learning?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on orchestration of ML workflows?\nWhen is Dagster the wrong choice?\nWhat do you have planned for the future of ML support in Dagster?\n\n\nContact Info\n\n\nLinkedIn\n@s_ryz on Twitter\nsryza on GitHub\n\n\nParting Question\n\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\n\nClosing Announcements\n\n\nThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you've learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\n\nLinks\n\n\nDagster\n\n\nData Engineering Podcast Episode\n\nCloudera\nHadoop\nApache Spark\nPeter Norvig\nJosh Wills\nREPL == Read Eval Print Loop\nRStudio\nMemoization\nMLFlow\nKedro\n\n\nData Engineering Podcast Episode\n\nMetaflow\n\n\nPodcast.__init__ Episode\n\nKubeflow\ndbt\n\n\nData Engineering Podcast Episode\n\nAirbyte\n\n\nData Engineering Podcast Episode\n\n\n\nThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0","content_html":"

Summary

\n\n

Building a machine learning model one time can be done in an ad-hoc manner, but if you ever want to update it and serve it in production you need a way of repeating a complex sequence of operations. Dagster is an orchestration engine that understands the data that it is manipulating so that you can move beyond coarse task-based representations of your dependencies. In this episode Sandy Ryza explains how his background in machine learning has informed his work on the Dagster project and the foundational principles that it is built on to allow for collaboration across data engineering and machine learning concerns.

\n\n

Interview

\n\n\n\n

Contact Info

\n\n\n\n

Parting Question

\n\n\n\n

Closing Announcements

\n\n\n\n

Links

\n\n\n\n

The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

","summary":"Building a machine learning model one time can be done in an ad-hoc manner, but if you ever want to update it and serve it in production you need a way of repeating a complex sequence of operations. Dagster is an orchestration engine that understands the data that it is manipulating so that you can move beyond coarse task-based representations of your dependencies. In this episode Sandy Ryza explains how his background in machine learning has informed his work on the Dagster project and the foundational principles that it is built on to allow for collaboration across data engineering and machine learning concerns.","date_published":"2022-12-01T19:00:00.000-05:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/a5f61e41-1d38-4835-b3be-469cd4aff668.mp3","mime_type":"audio/mpeg","size_in_bytes":30804690,"duration_in_seconds":2743}]},{"id":"podlove-2022-09-28t02:22:24+00:00-21e8a843165c1b0","title":"Solve The Cold Start Problem For Machine Learning By Letting Humans Teach The Computer With Aitomatic","url":"https://www.themachinelearningpodcast.com/aitomatic-machine-learning-cold-start-episode-13","content_text":"Summary\nMachine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the \"cold start\" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nPredibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!\nYour host is Tobias Macey and today I’m interviewing Christopher Nguyen about how to address the cold start problem for ML/AI projects\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what the \"cold start\" or \"small data\" problem is and its impact on an organization’s ability to invest in machine learning?\nWhat are some examples of use cases where ML is a viable solution but there is a corresponding lack of usable data?\nHow does the model design influence the data requirements to build it? (e.g. statistical model vs. deep learning, etc.)\nWhat are the available options for addressing a lack of data for ML?\n\nWhat are the characteristics of a given data set that make it suitable for ML use cases?\n\n\nCan you describe what you are building at Aitomatic and how it helps to address the cold start problem?\n\nHow have the design and goals of the product changed since you first started working on it?\n\n\nWhat are some of the education challenges that you face when working with organizations to help them understand how to think about ML/AI investment and practical limitations?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Aitomatic/H1st used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Aitomatic/H1st?\nWhen is a human/knowledge driven approach to ML development the wrong choice?\nWhat do you have planned for the future of Aitomatic?\n\nContact Info\n\nLinkedIn\n@pentagoniac on Twitter\nGoogle Scholar\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nAitomatic\nHuman First AI\nKnowledge First World Symposium\nAtari 800\nCold start problem\nScale AI\nSnorkel AI\n\nPodcast Episode\n\n\nAnomaly Detection\nExpert Systems\nICML == International Conference on Machine Learning\nNIST == National Institute of Standards and Technology\nMulti-modal Model\nSVM == Support Vector Machine\nTensorflow\nPytorch\n\nPodcast.__init__ Episode\n\n\nOSS Capital\nDALL-E\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Predibase: ![Predibase Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bbtLDXUq.png)\r\nPredibase’s founders saw the pain of getting ML models developed and in-production, taking up to a year even at leading tech companies like Uber, so they built internal platforms that drastically lowered the time-to-value and increased access. The key was taking a “declarative approach” to machine learning, which Piero Molino (CEO) introduced with Ludwig, an open source framework to create deep learning models with 8,400+ GitHub stars, more than 100 contributors, and thousands of monthly downloads. With Ludwig, tasks that took months-to-years were handed off to teams in thirty minutes and just six lines of human-readable configuration that can define an entire machine learning pipeline.\r\n\r\nNow with Predibase, we are bringing the power of declarative machine learning built on top of Ludwig to broader organizations with our enterprise platform. Like Infrastructure as Code simplified IT, Predibase’s machine learning (ML) platform allows users to focus on the “what” of their ML models rather than the “how”, breaking free of the usual limits in low-code systems and bringing down the time-to-value of ML projects from years to days. \r\n\r\n[Click here](https://themachinelearningpodcast.com/predibase) to learn more and try it for yourself!","content_html":"

Summary

\n

Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the \"cold start\" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"Machine learning is a data-hungry approach to problem solving. Unfortunately, there are a number of problems that would benefit from the automation provided by artificial intelligence capabilities that don’t come with troves of data to build from. Christopher Nguyen and his team at Aitomatic are working to address the \"cold start\" problem for ML by letting humans generate models by sharing their expertise through natural language. In this episode he explains how that works, the various ways that we can start to layer machine learning capabilities on top of each other, as well as the risks involved in doing so without incorporating lessons learned in the growth of the software industry.","date_published":"2022-09-27T22:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/ebc89099-3b05-4001-83ce-adc5baf67c5a.mp3","mime_type":"audio/mpeg","size_in_bytes":40977569,"duration_in_seconds":3127}]},{"id":"podlove-2022-09-21t02:08:04+00:00-205327875cb7472","title":"Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee","url":"https://www.themachinelearningpodcast.com/towhee-embedding-vector-etl-library-episode-12","content_text":"Summary\nData is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!\nYour host is Tobias Macey and today I’m interviewing Frank Liu about how to use vector embeddings in your ML projects and how Towhee can reduce the effort involved\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Towhee is and the story behind it?\nWhat is the problem that Towhee is aimed at solving?\nWhat are the elements of generating vector embeddings that pose the greatest challenge or require the most effort?\nOnce you have an embedding, what are some of the ways that it might be used in a machine learning project?\n\nAre there any design considerations that need to be addressed in the form that an embedding takes and how it impacts the resultant model that relies on it? (whether for training or inference)\n\n\nCan you describe how the Towhee framework is implemented?\n\nWhat are some of the interesting engineering challenges that needed to be addressed?\nHow have the design/goals/scope of the project shifted since it began?\n\n\nWhat is the workflow for someone using Towhee in the context of an ML project?\nWhat are some of the types optimizations that you have incorporated into Towhee?\n\nWhat are some of the scaling considerations that users need to be aware of as they increase the volume or complexity of data that they are processing?\n\n\nWhat are some of the ways that using Towhee impacts the way a data scientist or ML engineer approach the design development of their model code?\nWhat are the interfaces available for integrating with and extending Towhee?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Towhee used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Towhee?\nWhen is Towhee the wrong choice?\nWhat do you have planned for the future of Towhee?\n\nContact Info\n\nLinkedIn\nfzliu on GitHub\nWebsite\n@frankzliu on Twitter\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nTowhee\nZilliz\nMilvus\n\nData Engineering Podcast Episode\n\n\nComputer Vision\nTensor\nAutoencoder\nLatent Space\nDiffusion Model\nHSL == Hue, Saturation, Lightness\nWeights and Biases\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Deepchecks: ![Deepchecks Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/AHorqO3V.png)\r\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to [themachinelearningpodcast.com/deepchecks](https://www.themachinelearningpodcast.com/deepchecks) today to get started!","content_html":"

Summary

\n

Data is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"Data is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to structure data in a way that is native to how models interpret and manipulate information. In this episode Frank Liu shares how the Towhee library simplifies the work of translating your unstructured data assets (e.g. images, audio, video, etc.) into embeddings that you can use efficiently for machine learning, and how it fits into your workflow for model development.","date_published":"2022-09-21T11:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/e902b661-9942-440a-9570-43971838c44f.mp3","mime_type":"audio/mpeg","size_in_bytes":38194748,"duration_in_seconds":3113}]},{"id":"podlove-2022-09-14t02:05:43+00:00-c51e55fa5102809","title":"Shedding Light On Silent Model Failures With NannyML","url":"https://www.themachinelearningpodcast.com/nannyml-silent-model-failure-episode-11","content_text":"Summary\nBecause machine learning models are constantly interacting with inputs from the real world they are subject to a wide variety of failures. The most commonly discussed error condition is concept drift, but there are numerous other ways that things can go wrong. In this episode Wojtek Kuberski explains how NannyML is designed to compare the predicted performance of your model against its actual behavior to identify silent failures and provide context to allow you to determine whether and how urgently to address them.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nData powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!\nYour host is Tobias Macey and today I’m interviewing Wojtek Kuberski about NannyML and the work involved in post-deployment data science\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what NannyML is and the story behind it?\nWhat is \"post-deployment data science\"?\n\nHow does it differ from the metrics/monitoring approach to managing the model lifecycle?\nWho is typically responsible for this work? How does NannyML augment their skills?\nWhat are some of your experiences with model failure that motivated you to spend your time and focus on this problem?\n\n\nWhat are the main contributing factors to alert fatigue for ML systems?\nWhat are some of the ways that a model can fail silently?\n\nHow does NannyML detect those conditions?\n\n\nWhat are the remediation actions that might be necessary once an issue is detected in a model?\nCan you describe how NannyML is implemented?\n\nWhat are some of the technical and UX design problems that you have had to address?\nWhat are some of the ideas/assumptions that you have had to re-evaluate in the process of building NannyML?\n\n\nWhat additional capabilities are necessary for supporting less structured data?\nCan you describe what is involved in setting up NannyML and how it fits into an ML engineer’s workflow?\n\nOnce a model is deployed, what additional outputs/data can/should be collected to improve the utility of NannyML and feed into analysis of the real-world operation?\n\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen NannyML used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on NannyML?\nWhen is NannyML the wrong choice?\nWhat do you have planned for the future of NannyML?\n\nContact Info\n\nLinkedIn\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nNannyML\nF1 Score\nROC Curve\nConcept Drift\nA/B Testing\nJupyter Notebook\nVector Embedding\nAirflow\nEDA == Exploratory Data Analysis\nInspired book (affiliate link)\nZenML\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Galileo: ![Galileo Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/3Mm5horv.png)\r\nData powers machine learning, but poor data quality is the largest impediment to effective ML today.\r\n\r\nGalileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts.\r\n\r\nGet meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations.\r\n\r\nGalileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to [themachinelearningpodcast.com/galileo](https://www.themachinelearningpodcast.com/galileo) and request a demo today!","content_html":"

Summary

\n

Because machine learning models are constantly interacting with inputs from the real world they are subject to a wide variety of failures. The most commonly discussed error condition is concept drift, but there are numerous other ways that things can go wrong. In this episode Wojtek Kuberski explains how NannyML is designed to compare the predicted performance of your model against its actual behavior to identify silent failures and provide context to allow you to determine whether and how urgently to address them.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Wojtek Kuberski about the open source NannyML project and how it combines predicted performance of your model with observed outputs to identify silent model failures.","date_published":"2022-09-13T22:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/eed43cda-63e0-4571-9a99-4eed2635e2b4.mp3","mime_type":"audio/mpeg","size_in_bytes":45234970,"duration_in_seconds":3797}]},{"id":"podlove-2022-09-10t12:57:48+00:00-15d6e33d7ed1f70","title":"How To Design And Build Machine Learning Systems For Reasonable Scale","url":"https://www.themachinelearningpodcast.com/reasonable-scale-machine-learning-systems-episode-10","content_text":"Summary\nUsing machine learning in production requires a sophisticated set of cooperating technologies. A majority of resources that are available for understanding how to design and operate these platforms are focused on either simple examples that don’t scale, or over-engineered technologies designed for the massive scale of big tech companies. In this episode Jacopo Tagliabue shares his vision for \"ML at reasonable scale\" and how you can adopt these patterns for building your own platforms.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nDo you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.\nYour host is Tobias Macey and today I’m interviewing Jacopo Tagliabue about building \"reasonable scale\" ML systems\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nHow would you describe the current state of the ecosystem for ML practitioners? (e.g. tool selection, availability of information/tutorials, etc.)\n\nWhat are some of the notable changes that you have seen over the past 2 – 5 years?\nHow have the evolutions in the data engineering space been reflected in/influenced the way that ML is being done?\n\n\nWhat are the challenges/points of friction that ML practitioners have to contend with when trying to get a model into production that isn’t just a toy?\nYou wrote a set of tutorials and accompanying code about performing ML at \"reasonable scale\". What are you aiming to represent with that phrasing?\n\nThere is a paradox of choice for any newcomer to ML. What are some of the key capabilities that practitioners should use in their decision rubric when designing a \"reasonable scale\" system?\nWhat are some of the common bottlenecks that crop up when moving from an initial test implementation to a scalable deployment that is serving customer traffic?\n\n\nHow much of an impact does the type of ML problem being addressed have on the deployment and scalability elements of the system design? (e.g. NLP vs. computer vision vs. recommender system, etc.)\nWhat are some of the misleading pieces of advice that you have seen from \"big tech\" tutorials about how to do ML that are unnecessary when running at smaller scales?\nYou also spend some time discussing the benefits of a \"NoOps\" approach to ML deployment. At what point do operations/infrastructure engineers need to get involved?\n\nWhat are the operational aspects of ML applications that infrastructure engineers working in product teams might be unprepared for?\n\n\nWhat are the most interesting, innovative, or unexpected system designs that you have seen for moderate scale MLOps?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on ML system design and implementation?\nWhat are the aspects of ML systems design that you are paying attention to in the current ecosystem?\nWhat advice do you have for additional references or research that ML practitioners would benefit from when designing their own production systems?\n\nContact Info\n\njacopotagliabue on GitHub\nWebsite\nLinkedIn\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nThe Post-Modern Stack: ML At Reasonable Scale\nCoveo\nNLP == Natural Language Processing\nRecList\nPart of speech tagging\nMarkov Model\nYDNABB (You Don’t Need A Bigger Boat)\ndbt\n\nData Engineering Podcast Episode\n\n\nSeldon\nMetaflow\n\nPodcast.__init__ Episode\n\n\nSnowflake\nInformation Retrieval\nModern Data Stack\nSQLite\nSpark SQL\nAWS Athena\nKeras\nPyTorch\nLuigi\nAirflow\nFlask\nAWS Fargate\nAWS Sagemaker\nRecommendations At Reasonable Scale\nPinecone\n\nData Engineering Podcast Episode\n\n\nRedis\nKNN == K-Nearest Neighbors\nPinterest Engineering Blog\nMaterialize\nOpenAI\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Graft: ![Graft Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/LwNCPdKW.png)\r\nGraft™ is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain.\r\n\r\nFor more information on Graft or to schedule a demo go to [themachinelearningpodcast.com/graft](https://www.themachinelearningpodcast.com/graft) today! And tell them Tobias sent you.","content_html":"

Summary

\n

Using machine learning in production requires a sophisticated set of cooperating technologies. A majority of resources that are available for understanding how to design and operate these platforms are focused on either simple examples that don’t scale, or over-engineered technologies designed for the massive scale of big tech companies. In this episode Jacopo Tagliabue shares his vision for "ML at reasonable scale" and how you can adopt these patterns for building your own platforms.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Jacopo Tagliabue about how to design machine learning systems to support operations at the scale required by a majority of companies.","date_published":"2022-09-10T09:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/acdd9575-d0fd-436d-9fff-8cbef5e9172c.mp3","mime_type":"audio/mpeg","size_in_bytes":39228975,"duration_in_seconds":3249}]},{"id":"podlove-2022-09-09t00:51:53+00:00-0cf97aaacb72e1d","title":"Building A Business Powered By Machine Learning At Assembly AI","url":"https://www.themachinelearningpodcast.com/assembly-ai-machine-learning-product-episode-9","content_text":"Summary\nThe increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nPredibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!\nYour host is Tobias Macey and today I’m interviewing Dylan Fox about building and growing a business with ML as its core offering\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Assembly is and the story behind it?\n\nFor anyone who isn’t familiar with your platform, can you describe the role that ML/AI plays in your product?\n\n\nWhat was your process for going from idea to prototype for an AI powered business?\n\nCan you offer parallels between your own experience and that of your peers who are building businesses oriented more toward pure software applications?\n\n\nHow are you structuring your teams?\nOn the path to your current scale and capabilities how have you managed scoping of your model capabilities and operational scale to avoid getting bogged down or burnt out?\nHow do you think about scoping of model functionality to balance composability and system complexity?\nWhat is your process for identifying and understanding which problems are suited to ML and when to rely on pure software?\nYou are constantly iterating on model performance and introducing new capabilities. How do you manage prototyping and experimentation cycles?\n\nWhat are the metrics that you track to identify whether and when to move from an experimental to an operational state with a model?\nWhat is your process for understanding what’s possible and what can feasibly operate at scale?\n\n\nCan you describe your overall operational patterns delivery process for ML?\nWhat are some of the most useful investments in tooling that you have made to manage development experience for your teams?\nOnce you have a model in operation, how do you manage performance tuning? (from both a model and an operational scalability perspective)\nWhat are the most interesting, innovative, or unexpected aspects of ML development and maintenance that you have encountered while building and growing the Assembly platform?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Assembly?\nWhen is ML the wrong choice?\nWhat do you have planned for the future of Assembly?\n\nContact Info\n\n@YouveGotFox on Twitter\nLinkedIn\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nAssembly AI\n\nPodcast.__init__ Episode\n\n\nLearn Python the Hard Way\nNLTK\nNLP == Natural Language Processing\nNLU == Natural Language Understanding\nSpeech Recognition\nTensorflow\nr/machinelearning\nSciPy\nPyTorch\nJax\nHuggingFace\nRNN == Recurrent Neural Network\nCNN == Convolutional Neural Network\nLSTM == Long Short Term Memory\nHidden Markov Models\nBaidu DeepSpeech\nCTC (Connectionist Temporal Classification) Loss Model\nTwilio\nGrid Search\nK80 GPU\nA100 GPU\nTPU == Tensor Processing Unit\nFoundation Models\nBLOOM Language Model\nDALL-E 2\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Predibase: ![Predibase Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bbtLDXUq.png)\r\nPredibase’s founders saw the pain of getting ML models developed and in-production, taking up to a year even at leading tech companies like Uber, so they built internal platforms that drastically lowered the time-to-value and increased access. The key was taking a “declarative approach” to machine learning, which Piero Molino (CEO) introduced with Ludwig, an open source framework to create deep learning models with 8,400+ GitHub stars, more than 100 contributors, and thousands of monthly downloads. With Ludwig, tasks that took months-to-years were handed off to teams in thirty minutes and just six lines of human-readable configuration that can define an entire machine learning pipeline.\r\n\r\nNow with Predibase, we are bringing the power of declarative machine learning built on top of Ludwig to broader organizations with our enterprise platform. Like Infrastructure as Code simplified IT, Predibase’s machine learning (ML) platform allows users to focus on the “what” of their ML models rather than the “how”, breaking free of the usual limits in low-code systems and bringing down the time-to-value of ML projects from years to days. \r\n\r\n[Click here](https://themachinelearningpodcast.com/predibase) to learn more and try it for yourself!","content_html":"

Summary

\n

The increasing sophistication of machine learning has enabled dramatic transformations of businesses and introduced new product categories. At Assembly AI they are offering advanced speech recognition and natural language models as an API service. In this episode founder Dylan Fox discusses the unique challenges of building a business with machine learning as the core product.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Dylan Fox about the unique challenges and potential involved in building a business with machine learning as the core capability that drives the product and the approach that he has taken at Assembly AI.","date_published":"2022-09-08T20:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/cae92599-3f1e-4284-b93a-9f2379307aac.mp3","mime_type":"audio/mpeg","size_in_bytes":41267132,"duration_in_seconds":3522}]},{"id":"podlove-2022-08-26t01:44:54+00:00-e4398ef2e1f1982","title":"Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River","url":"https://www.themachinelearningpodcast.com/river-streaming-machine-learning-episode-8","content_text":"Summary\nThe majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!\nYour host is Tobias Macey and today I’m interviewing Max Halford about River, a Python toolkit for streaming and online machine learning\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what River is and the story behind it?\nWhat is \"online\" machine learning?\n\nWhat are the practical differences with batch ML?\nWhy is batch learning so predominant?\nWhat are the cases where someone would want/need to use online or streaming ML?\n\n\nThe prevailing pattern for batch ML model lifecycles is to train, deploy, monitor, repeat. What does the ongoing maintenance for a streaming ML model look like?\n\nConcept drift is typically due to a discrepancy between the data used to train a model and the actual data being observed. How does the use of online learning affect the incidence of drift?\n\n\nCan you describe how the River framework is implemented?\n\nHow have the design and goals of the project changed since you started working on it?\n\n\nHow do the internal representations of the model differ from batch learning to allow for incremental updates to the model state?\nIn the documentation you note the use of Python dictionaries for state management and the flexibility offered by that choice. What are the benefits and potential pitfalls of that decision?\nCan you describe the process of using River to design, implement, and validate a streaming ML model?\n\nWhat are the operational requirements for deploying and serving the model once it has been developed?\n\n\nWhat are some of the challenges that users of River might run into if they are coming from a batch learning background?\nWhat are the most interesting, innovative, or unexpected ways that you have seen River used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on River?\nWhen is River the wrong choice?\nWhat do you have planned for the future of River?\n\nContact Info\n\nEmail\n@halford_max on Twitter\nMaxHalford on GitHub\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nRiver\nscikit-multiflow\nFederated Machine Learning\nHogwild! Google Paper\nChip Huyen concept drift blog post\nDan Crenshaw Berkeley Clipper MLOps\nRobustness Principle\nNY Taxi Dataset\nRiverTorch\nRiver Public Roadmap\nBeaver tool for deploying online models\nProdigy ML human in the loop labeling\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Deepchecks: ![Deepchecks Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/AHorqO3V.png)\r\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to [themachinelearningpodcast.com/deepchecks](https://www.themachinelearningpodcast.com/deepchecks) today to get started!","content_html":"

Summary

\n

The majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being a discrete and isolated task. Unfortunately, the real world is rarely static, leading to concept drift and model failures. River is a framework for building streaming machine learning projects that can constantly adapt to new information. In this episode Max Halford explains how the project works, why you might (or might not) want to consider streaming ML, and how to get started building with River.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Max Halford about the benefits of streaming machine learning for systems that need to learn continuously without being taken offline and how the River library supports building those models.","date_published":"2022-08-25T21:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/4e493d23-5f30-42ea-bb5f-2112844f5421.mp3","mime_type":"audio/mpeg","size_in_bytes":55208812,"duration_in_seconds":4520}]},{"id":"podlove-2022-08-13t18:33:06+00:00-8f46146c472e4cc","title":"Using AI To Transform Your Business Without The Headache Using Graft","url":"https://www.themachinelearningpodcast.com/graft-modern-ai-platform-episode-7","content_text":"Summary\nMachine learning is a transformative tool for the organizations that can take advantage of it. While the frameworks and platforms for building machine learning applications are becoming more powerful and broadly available, there is still a significant investment of time, money, and talent required to take full advantage of it. In order to reduce that barrier further Adam Oliner and Brian Calvert, along with their other co-founders, started Graft. In this episode Adam and Brian explain how they have built a platform designed to empower everyone in the business to take part in designing and building ML projects, while managing the end-to-end workflow required to go from data to production.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nPredibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!\nYour host is Tobias Macey and today I’m interviewing Brian Calvert and Adam Oliner about Graft, a cloud-native platform designed to simplify the work of applying AI to business problems\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Graft is and the story behind it?\nWhat is the core thesis of the problem you are targeting?\n\nHow does the Graft product address that problem?\nWho are the personas that you are focused on working with both now in your early stages and in the future as you evolve the product?\n\n\nWhat are the capabilities that can be unlocked in different organizations by reducing the friction and up-front investment required to adopt ML/AI?\n\nWhat are the user-facing interfaces that you are focused on providing to make that adoption curve as shallow as possible?\n\nWhat are some of the unavoidable bits of complexity that need to be surfaced to the end user?\n\n\n\n\nCan you describe the infrastructure and platform design that you are relying on for the Graft product?\n\nWhat are some of the emerging \"best practices\" around ML/AI that you have been able to build on top of?\n\nAs new techniques and practices are discovered/introduced how are you thinking about the adoption process and how/when to integrate them into the Graft product?\n\n\nWhat are some of the new engineering challenges that you have had to tackle as a result of your specific product?\n\n\nMachine learning can be a very data and compute intensive endeavor. How are you thinking about scalability in a multi-tenant system?\n\nDifferent model and data types can be widely divergent in terms of the cost (monetary, time, compute, etc.) required. How are you thinking about amortizing vs. passing through those costs to the end user?\n\n\nCan you describe the adoption/integration process for someone using Graft?\n\nOnce they are onboarded and they have connected to their various data sources, what is the workflow for someone to apply ML capabilities to their problems?\n\n\nOne of the challenges about the current state of ML capabilities and adoption is understanding what is possible and what is impractical. How have you designed Graft to help identify and expose opportunities for applying ML within the organization?\nWhat are some of the challenges of customer education and overall messaging that you are working through?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Graft used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Graft?\nWhen is Graft the wrong choice?\nWhat do you have planned for the future of Graft?\n\nContact Info\n\nBrian\n\nLinkedIn\n\n\nAdam\n\nLinkedIn\n\n\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nGraft\nHigh Energy Particle Physics\nLHC\nCruise\nSlack\nSplunk\nMarvin Minsky\nPatrick Henry Winston\nAI Winter\nSebastian Thrun\nDARPA Grand Challenge\nHigss Boson\nSupersymmetry\nKinematics\nTransfer Learning\nFoundation Models\nML Embeddings\nBERT\nAirflow\nDagster\nPrefect\nDask\nKubeflow\nMySQL\nPostgreSQL\nSnowflake\nRedshift\nS3\nKubernetes\nMulti-modal models\nMulti-task models\nMagic: The Gathering\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&utm_medium=rss\n\n\nSponsored By:Predibase: ![Predibase Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bbtLDXUq.png)\r\nPredibase’s founders saw the pain of getting ML models developed and in-production, taking up to a year even at leading tech companies like Uber, so they built internal platforms that drastically lowered the time-to-value and increased access. The key was taking a “declarative approach” to machine learning, which Piero Molino (CEO) introduced with Ludwig, an open source framework to create deep learning models with 8,400+ GitHub stars, more than 100 contributors, and thousands of monthly downloads. With Ludwig, tasks that took months-to-years were handed off to teams in thirty minutes and just six lines of human-readable configuration that can define an entire machine learning pipeline.\r\n\r\nNow with Predibase, we are bringing the power of declarative machine learning built on top of Ludwig to broader organizations with our enterprise platform. Like Infrastructure as Code simplified IT, Predibase’s machine learning (ML) platform allows users to focus on the “what” of their ML models rather than the “how”, breaking free of the usual limits in low-code systems and bringing down the time-to-value of ML projects from years to days. \r\n\r\n[Click here](https://themachinelearningpodcast.com/predibase) to learn more and try it for yourself!Deepchecks: ![Deepchecks Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/AHorqO3V.png)\r\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to [themachinelearningpodcast.com/deepchecks](https://www.themachinelearningpodcast.com/deepchecks) today to get started!","content_html":"

Summary

\n

Machine learning is a transformative tool for the organizations that can take advantage of it. While the frameworks and platforms for building machine learning applications are becoming more powerful and broadly available, there is still a significant investment of time, money, and talent required to take full advantage of it. In order to reduce that barrier further Adam Oliner and Brian Calvert, along with their other co-founders, started Graft. In this episode Adam and Brian explain how they have built a platform designed to empower everyone in the business to take part in designing and building ML projects, while managing the end-to-end workflow required to go from data to production.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&utm_medium=rss

\n
\n\n

\"\"

Sponsored By:

","summary":"","date_published":"2022-08-15T21:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/9761c680-c151-48d8-8a4a-68736b7b3f28.mp3","mime_type":"audio/mpeg","size_in_bytes":50785152,"duration_in_seconds":4053}]},{"id":"podlove-2022-08-06t14:34:00+00:00-216b02e6499d16d","title":"Accelerate Development And Delivery Of Your Machine Learning Projects With A Comprehensive Feature Platform","url":"https://www.themachinelearningpodcast.com/tecton-machine-learning-feature-platform-episode-6","content_text":"Summary\nIn order for a machine learning model to build connections and context across the data that is fed into it the raw data needs to be engineered into semantic features. This is a process that can be tedious and full of toil, requiring constant upkeep and often leading to rework across projects and teams. In order to reduce the amount of wasted effort and speed up experimentation and training iterations a new generation of services are being developed. Tecton first built a feature store to serve as a central repository of engineered features and keep them up to date for training and inference. Since then they have expanded the set of tools and services to be a full-fledged feature platform. In this episode Kevin Stumpf explains the different capabilities and activities related to features that are necessary to maintain velocity in your machine learning projects.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!\nDo you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.\nData powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!\nYour host is Tobias Macey and today I’m interviewing Kevin Stumpf about the role of feature platforms in your ML engineering workflow\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what you mean by the term \"feature platform\"?\n\nWhat are the components and supporting capabilities that are needed for such a platform?\n\n\nHow does the availability of engineered features impact the ability of an organization to put ML into production?\nWhat are the points of friction that teams encounter when trying to build and maintain ML projects in the absence of a fully integrated feature platform?\nWho are the target personas for the Tecton platform?\n\nWhat stages of the ML lifecycle does it address?\n\n\nCan you describe how you have designed the Tecton feature platform?\n\nHow have the goals and capabilities of the product evolved since you started working on it?\n\n\nWhat is the workflow for an ML engineer or data scientist to build and maintain features and use them in the model development workflow?\nWhat are the responsibilities of the MLOps stack that you have intentionally decided not to address?\n\nWhat are the interfaces and extension points that you offer for integrating with the other utilities needed to manage a full ML system?\n\n\nYou wrote a post about the need to establish a DevOps approach to ML data. In keeping with that theme, can you describe how to think about the approach to testing and validation techniques for features and their outputs?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Tecton/Feast used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Tecton?\nWhen is Tecton the wrong choice?\nWhat do you have planned for the future of the Tecton feature platform?\n\nContact Info\n\nLinkedIn\n@kevinmstumpf on Twitter\nkevinstumpf on GitHub\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nLinks\n\nTecton\n\nData Engineering Podcast Episode\n\n\nUber Michaelangelo\nFeature Store\nSnowflake\n\nData Engineering Podcast Episode\n\n\nDynamoDB\nTrain/Serve Skew\nLambda Architecture\nRedis\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&utm_medium=rss\n\n\nSponsored By:Graft: ![Graft Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/LwNCPdKW.png)\r\nGraft™ is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain.\r\n\r\nFor more information on Graft or to schedule a demo go to [themachinelearningpodcast.com/graft](https://www.themachinelearningpodcast.com/graft) today! And tell them Tobias sent you.Galileo: ![Galileo Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/3Mm5horv.png)\r\nData powers machine learning, but poor data quality is the largest impediment to effective ML today.\r\n\r\nGalileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts.\r\n\r\nGet meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations.\r\n\r\nGalileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to [themachinelearningpodcast.com/galileo](https://www.themachinelearningpodcast.com/galileo) and request a demo today!Deepchecks: ![Deepchecks Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/AHorqO3V.png)\r\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to [themachinelearningpodcast.com/deepchecks](https://www.themachinelearningpodcast.com/deepchecks) today to get started!","content_html":"

Summary

\n

In order for a machine learning model to build connections and context across the data that is fed into it the raw data needs to be engineered into semantic features. This is a process that can be tedious and full of toil, requiring constant upkeep and often leading to rework across projects and teams. In order to reduce the amount of wasted effort and speed up experimentation and training iterations a new generation of services are being developed. Tecton first built a feature store to serve as a central repository of engineered features and keep them up to date for training and inference. Since then they have expanded the set of tools and services to be a full-fledged feature platform. In this episode Kevin Stumpf explains the different capabilities and activities related to features that are necessary to maintain velocity in your machine learning projects.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/?utm_source=rss&utm_medium=rss

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Kevin Stumpf about the impact of a comprehensive feature platform on the development and serving of machine learning models and how they are addressing that need at Tecton.","date_published":"2022-08-06T10:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/fae2845b-0a8e-43aa-9546-acb2ebcb0e30.mp3","mime_type":"audio/mpeg","size_in_bytes":36899954,"duration_in_seconds":3037}]},{"id":"podlove-2022-07-29t02:17:09+00:00-df190a2e1abd670","title":"Build Better Models Through Data Centric Machine Learning Development With Snorkel AI","url":"https://www.themachinelearningpodcast.com/snorkel-ai-data-centric-machine-learning-episode-5","content_text":"Summary\nMachine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!\nData powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!\nPredibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!\nYour host is Tobias Macey and today I’m interviewing Alex Ratner about Snorkel AI, a platform for data-centric machine learning workflows powered by programmatic data labeling techniques\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Snorkel AI is and the story behind it?\nWhat are the problems that you are focused on solving?\n\nWhich pieces of the ML lifecycle are you focused on?\n\n\nHow did your experience building the open source Snorkel project and working with the community inform your product direction for Snorkel AI?\n\nHow has the underlying Snorkel project evolved over the past 4 years?\n\n\nWhat are the deciding factors that an organization or ML team need to consider when evaluating existing labeling strategies against the programmatic approach that you provide?\n\nWhat are the features that Snorkel provides over and above managing code execution across the source data set?\n\n\nCan you describe what you have built at Snorkel AI and how it is implemented?\n\nWhat are some of the notable developments of the ML ecosystem that had a meaningful impact on your overall product vision/viability?\n\n\nCan you describe the workflow for an individual or team who is using Snorkel for generating their training data set?\n\nHow does Snorkel integrate with the experimentation process to track how changes to labeling logic correlate with the performance of the resulting model?\n\n\nWhat are some of the complexities involved in designing and testing the labeling logic?\n\nHow do you handle complex data formats such as audio, video, images, etc. that might require their own ML models to generate labels? (e.g. object detection for bounding boxes)\n\n\nWith the increased scale and quality of labeled data that Snorkel AI offers, how does that impact the viability of autoML toolchains for generating useful models?\nHow are you managing the governance and feature boundaries between the open source Snorkel project and the business that you have built around it?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Snorkel AI used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Snorkel AI?\nWhen is Snorkel AI the wrong choice?\nWhat do you have planned for the future of Snorkel AI?\n\nContact Info\n\nLinkedIn\nWebsite\n@ajratner on Twitter\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nSnorkel AI\n\nData Engineering Podcast Episode\n\n\nUniversity of Washington\nSnorkel OSS\nNatural Language Processing (NLP)\nTensorflow\nPyTorch\n\nPodcast.__init__ Episode\n\n\nDeep Learning\nFoundation Models\nMLFlow\nSHAP\n\nPodcast.__init__ Episode\n\n\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Predibase: ![Predibase Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bbtLDXUq.png)\r\nPredibase’s founders saw the pain of getting ML models developed and in-production, taking up to a year even at leading tech companies like Uber, so they built internal platforms that drastically lowered the time-to-value and increased access. The key was taking a “declarative approach” to machine learning, which Piero Molino (CEO) introduced with Ludwig, an open source framework to create deep learning models with 8,400+ GitHub stars, more than 100 contributors, and thousands of monthly downloads. With Ludwig, tasks that took months-to-years were handed off to teams in thirty minutes and just six lines of human-readable configuration that can define an entire machine learning pipeline.\r\n\r\nNow with Predibase, we are bringing the power of declarative machine learning built on top of Ludwig to broader organizations with our enterprise platform. Like Infrastructure as Code simplified IT, Predibase’s machine learning (ML) platform allows users to focus on the “what” of their ML models rather than the “how”, breaking free of the usual limits in low-code systems and bringing down the time-to-value of ML projects from years to days. \r\n\r\n[Click here](https://themachinelearningpodcast.com/predibase) to learn more and try it for yourself!Galileo: ![Galileo Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/3Mm5horv.png)\r\nData powers machine learning, but poor data quality is the largest impediment to effective ML today.\r\n\r\nGalileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts.\r\n\r\nGet meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations.\r\n\r\nGalileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to [themachinelearningpodcast.com/galileo](https://www.themachinelearningpodcast.com/galileo) and request a demo today!Deepchecks: ![Deepchecks Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/AHorqO3V.png)\r\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to [themachinelearningpodcast.com/deepchecks](https://www.themachinelearningpodcast.com/deepchecks) today to get started!","content_html":"

Summary

\n

Machine learning is a data hungry activity, and the quality of the resulting model is highly dependent on the quality of the inputs that it receives. Generating sufficient quantities of high quality labeled data is an expensive and time consuming process. In order to reduce that time and cost Alex Ratner and his team at Snorkel AI have built a system for powering data-centric machine learning development. In this episode he explains how the Snorkel platform allows domain experts to create labeling functions that translate their expertise into reusable logic that dramatically reduces the time needed to build training data sets and drives down the total cost.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Alex Ratner about Snorkel AI's platform for data-centric machine learning development that accelerates the rate at which teams can build high quality training data sets with the help of domain experts","date_published":"2022-07-28T22:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/6d19c87f-c59d-4a2d-995a-10d6bbe7ae6a.mp3","mime_type":"audio/mpeg","size_in_bytes":42772409,"duration_in_seconds":3229}]},{"id":"podlove-2022-07-21t10:09:24+00:00-a0632a614d9bc23","title":"Declarative Machine Learning For High Performance Deep Learning Models With Predibase","url":"https://www.themachinelearningpodcast.com/predibase-declarative-machine-learning-episode-4","content_text":"Summary\nDeep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!\nData powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!\nDo you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.\nYour host is Tobias Macey and today I’m interviewing Travis Addair about Predibase, a low-code platform for building ML models in a declarative format\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Predibase is and the story behind it?\nWho is your target audience and how does that focus influence your user experience and feature development priorities?\nHow would you describe the semantic differences between your chosen terminology of \"declarative ML\" and the \"autoML\" nomenclature that many projects and products have adopted?\n\nAnother platform that launched recently with a promise of \"declarative ML\" is Continual. How would you characterize your relative strengths?\n\n\nCan you describe how the Predibase platform is implemented?\n\nHow have the design and goals of the product changed as you worked through the initial implementation and started working with early customers?\nThe operational aspects of the ML lifecycle are still fairly nascent. How have you thought about the boundaries for your product to avoid getting drawn into scope creep while providing a happy path to delivery?\n\n\nLudwig is a core element of your platform. What are the other capabilities that you are layering around and on top of it to build a differentiated product?\nIn addition to the existing interfaces for Ludwig you created a new language in the form of PQL. What was the motivation for that decision?\n\nHow did you approach the semantic and syntactic design of the dialect?\nWhat is your vision for PQL in the space of \"declarative ML\" that you are working to define?\n\n\nCan you describe the available workflows for an individual or team that is using Predibase for prototyping and validating an ML model?\n\nOnce a model has been deemed satisfactory, what is the path to production?\n\n\nHow are you approaching governance and sustainability of Ludwig and Horovod while balancing your reliance on them in Predibase?\nWhat are some of the notable investments/improvements that you have made in Ludwig during your work of building Predibase?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Predibase used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Predibase?\nWhen is Predibase the wrong choice?\nWhat do you have planned for the future of Predibase?\n\nContact Info\n\nLinkedIn\ntgaddair on GitHub\n@travisaddair on Twitter\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nPredibase\nHorovod\nLudwig\n\nPodcast.__init__ Episode\n\n\nSupport Vector Machine\nHadoop\nTensorflow\nUber Michaelangelo\nAutoML\nSpark ML Lib\nDeep Learning\nPyTorch\nContinual\n\nData Engineering Podcast Episode\n\n\nOverton\nKubernetes\nRay\nNvidia Triton\nWhylogs\n\nData Engineering Podcast Episode\n\n\nWeights and Biases\nMLFlow\nComet\nConfusion Matrices\ndbt\n\nData Engineering Podcast Episode\n\n\nTorchscript\nSelf-supervised Learning\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\n","content_html":"

Summary

\n

Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in determining what neural architectures are best suited to a given task, engineering features, scaling computation, etc. Predibase is building on the successes of the Ludwig framework for declarative deep learning and Horovod for horizontally distributing model training. In this episode CTO and co-founder of Predibase, Travis Addair, explains how they are reducing the burden of model development even further with their managed service for declarative and low-code ML and how they are integrating with the growing ecosystem of solutions for the full ML lifecycle.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

","summary":"An interview with Travis Addair about the platform that he and his team at Predibase are building to empower everyone to build and deploy deep learning models in a low code approach for declarative machine learning development and how they are extending the capabilities of the open source Ludwig and Horovod frameworks","date_published":"2022-07-21T19:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/cc2e753c-9198-44d7-8448-d398b3ee8e93.mp3","mime_type":"audio/mpeg","size_in_bytes":48179103,"duration_in_seconds":3619}]},{"id":"podlove-2022-07-14t00:52:42+00:00-37211df4d5c5ae2","title":"Stop Feeding Garbage Data To Your ML Models, Clean It Up With Galileo","url":"https://www.themachinelearningpodcast.com/galileo-machine-learning-data-management-episode-3","content_text":"Summary\nMachine learning is a force multiplier that can generate an outsized impact on your organization. Unfortunately, if you are feeding your ML model garbage data, then you will get orders of magnitude more garbage out of it. The team behind Galileo experienced that pain for themselves and have set out to make data management and cleaning for machine learning a first class concern in your workflow. In this episode Vikram Chatterji shares the story of how Galileo got started and how you can use their platform to fix your ML data so that you can get back to the fun parts.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nPredibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!\nDo you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to themachinelearningpodcast.com/deepchecks today to get started!\nYour host is Tobias Macey and today I’m interviewing Vikram Chatterji about Galileo, a platform for uncovering and addressing data problems to improve your model quality\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Galileo is and the story behind it?\nWho are the target users of the platform and what are the tools/workflows that you are replacing?\n\nHow does that focus inform and influence the design and prioritization of features in the platform?\n\n\nWhat are some of the real-world impacts that you have experienced as a result of the kinds of data problems that you are addressing with Galileo?\nCan you describe how the Galileo product is implemented?\n\nWhat are some of the assumptions that you had formed from your own experiences that have been challenged as you worked with early design partners?\n\n\nThe toolchains and model architectures of any given team is unlikely to be a perfect match across departments or organizations. What are the core principles/concepts that you have hooked into in order to provide the broadest compatibility?\n\nWhat are the model types/frameworks/etc. that you have had to forego support for in the early versions of your product?\n\n\nCan you describe the workflow for someone building a machine learning model and how Galileo fits across the various stages of that cycle?\n\nWhat are some of the biggest difficulties posed by the non-linear nature of the experimentation cycle in model development?\n\n\nWhat are some of the ways that you work to quantify the impact of your tool on the productivity and profit contributions of an ML team/organization?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Galileo used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Galileo?\nWhen is Galileo the wrong choice?\nWhat do you have planned for the future of Galileo?\n\nContact Info\n\nLinkedIn\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nGalileo\nF1 Score\nTensorflow\nKeras\nSpaCy\n\nPodcast.__init__ Episode\n\n\nPytorch\n\nPodcast.__init__ Episode\n\n\nMXNet\nJax\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Graft: ![Graft Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/LwNCPdKW.png)\r\nGraft™ is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain.\r\n\r\nFor more information on Graft or to schedule a demo go to [themachinelearningpodcast.com/graft](https://www.themachinelearningpodcast.com/graft) today! And tell them Tobias sent you.Galileo: ![Galileo Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/3Mm5horv.png)\r\nData powers machine learning, but poor data quality is the largest impediment to effective ML today.\r\n\r\nGalileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts.\r\n\r\nGet meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations.\r\n\r\nGalileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to [themachinelearningpodcast.com/galileo](https://www.themachinelearningpodcast.com/galileo) and request a demo today!Deepchecks: ![Deepchecks Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/AHorqO3V.png)\r\nBuilding good ML models is hard, but testing them properly is even harder. At Deepchecks, they built an open-source testing framework that follows best practices, ensuring that your models behave as expected. Get started quickly using their built-in library of checks for testing and validating your model’s behavior and performance, and extend it to meet your specific needs as your model evolves. Accelerate your machine learning projects by building trust in your models and automating the testing that you used to do manually. Go to [themachinelearningpodcast.com/deepchecks](https://www.themachinelearningpodcast.com/deepchecks) today to get started!","content_html":"

Summary

\n

Machine learning is a force multiplier that can generate an outsized impact on your organization. Unfortunately, if you are feeding your ML model garbage data, then you will get orders of magnitude more garbage out of it. The team behind Galileo experienced that pain for themselves and have set out to make data management and cleaning for machine learning a first class concern in your workflow. In this episode Vikram Chatterji shares the story of how Galileo got started and how you can use their platform to fix your ML data so that you can get back to the fun parts.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Galileo co-founder Vikram Chatterji about the challenges of managing unstructured data assets for machine learning projects and how their platform is designed to ease the burden of maintaining clean data sets","date_published":"2022-07-13T20:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/1388aacb-7433-4605-9c31-d2f34b7c5094.mp3","mime_type":"audio/mpeg","size_in_bytes":39255602,"duration_in_seconds":2823}]},{"id":"podlove-2022-07-06t01:35:41+00:00-e3950a082837335","title":"Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks","url":"https://www.themachinelearningpodcast.com/deepchecks-open-source-macehine-learning-testing-episode-2","content_text":"Summary\nMachine learning has the potential to transform industries and revolutionize business capabilities, but only if the models are reliable and robust. Because of the fundamental probabilistic nature of machine learning techniques it can be challenging to test and validate the generated models. The team at Deepchecks understands the widespread need to easily and repeatably check and verify the outputs of machine learning models and the complexity involved in making it a reality. In this episode Shir Chorev and Philip Tannor explain how they are addressing the problem with their open source deepchecks library and how you can start using it today to build trust in your machine learning applications.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nDo you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.\nPredibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!\nData powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!\nYour host is Tobias Macey and today I’m interviewing Shir Chorev and Philip Tannor about Deepchecks, a Python package for comprehensively validating your machine learning models and data with minimal effort.\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Deepchecks is and the story behind it?\nWho is the target audience for the project?\n\nWhat are the biggest challenges that these users face in bringing ML models from concept to production and how does DeepChecks address those problems?\n\n\nIn the absence of DeepChecks how are practitioners solving the problems of model validation and comparison across iteratiosn?\n\nWhat are some of the other tools in this ecosystem and what are the differentiating features of DeepChecks?\n\n\nWhat are some examples of the kinds of tests that are useful for understanding the \"correctness\" of models?\n\nWhat are the methods by which ML engineers/data scientists/domain experts can define what \"correctness\" means in a given model or subject area?\n\n\nIn software engineering the categories of tests are tiered as unit -> integration -> end-to-end. What are the relevant categories of tests that need to be built for validating the behavior of machine learning models?\nHow do model monitoring utilities overlap with the kinds of tests that you are building with deepchecks?\nCan you describe how the DeepChecks package is implemented?\n\nHow have the design and goals of the project changed or evolved from when you started working on it?\nWhat are the assumptions that you have built up from your own experiences that have been challenged by your early users and design partners?\n\n\nCan you describe the workflow for an individual or team using DeepChecks as part of their model training and deployment lifecycle?\nTest engineering is a deep discipline in its own right. How have you approached the user experience and API design to reduce the overhead for ML practitioners to adopt good practices?\nWhat are the interfaces available for creating reusable tests and composing test suites together?\nWhat are the additional services/capabilities that you are providing in your commercial offering?\n\nHow are you managing the governance and sustainability of the OSS project and balancing that against the needs/priorities of the business?\n\n\nWhat are the most interesting, innovative, or unexpected ways that you have seen DeepChecks used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on DeepChecks?\nWhen is DeepChecks the wrong choice?\nWhat do you have planned for the future of DeepChecks?\n\nContact Info\n\nShir\n\nLinkedIn\nshir22 on GitHub\n\n\nPhilip\n\nLinkedIn\n@philiptannor on Twitter\n\n\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nDeepChecks\nRandom Forest\nTalpiot Program\nSHAP\n\nPodcast.__init__ Episode\n\n\nAirflow\nGreat Expectations\n\nData Engineering Podcast Episode\n\n\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Predibase: ![Predibase Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bbtLDXUq.png)\r\nPredibase’s founders saw the pain of getting ML models developed and in-production, taking up to a year even at leading tech companies like Uber, so they built internal platforms that drastically lowered the time-to-value and increased access. The key was taking a “declarative approach” to machine learning, which Piero Molino (CEO) introduced with Ludwig, an open source framework to create deep learning models with 8,400+ GitHub stars, more than 100 contributors, and thousands of monthly downloads. With Ludwig, tasks that took months-to-years were handed off to teams in thirty minutes and just six lines of human-readable configuration that can define an entire machine learning pipeline.\r\n\r\nNow with Predibase, we are bringing the power of declarative machine learning built on top of Ludwig to broader organizations with our enterprise platform. Like Infrastructure as Code simplified IT, Predibase’s machine learning (ML) platform allows users to focus on the “what” of their ML models rather than the “how”, breaking free of the usual limits in low-code systems and bringing down the time-to-value of ML projects from years to days. \r\n\r\n[Click here](https://themachinelearningpodcast.com/predibase) to learn more and try it for yourself!Graft: ![Graft Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/LwNCPdKW.png)\r\nGraft™ is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain.\r\n\r\nFor more information on Graft or to schedule a demo go to [themachinelearningpodcast.com/graft](https://www.themachinelearningpodcast.com/graft) today! And tell them Tobias sent you.","content_html":"

Summary

\n

Machine learning has the potential to transform industries and revolutionize business capabilities, but only if the models are reliable and robust. Because of the fundamental probabilistic nature of machine learning techniques it can be challenging to test and validate the generated models. The team at Deepchecks understands the widespread need to easily and repeatably check and verify the outputs of machine learning models and the complexity involved in making it a reality. In this episode Shir Chorev and Philip Tannor explain how they are addressing the problem with their open source deepchecks library and how you can start using it today to build trust in your machine learning applications.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Shir Chorev and Philip Tannor about model validation and testing with the open source deepchecks library and the challenges of testing machine learning projects","date_published":"2022-07-05T22:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/b0c1825e-c89b-4d2d-b3de-0e33caa90586.mp3","mime_type":"audio/mpeg","size_in_bytes":38529166,"duration_in_seconds":2920}]},{"id":"podlove-2022-06-28t00:36:46+00:00-6e6defdac20d145","title":"Build A Full Stack ML Powered App In An Afternoon With Baseten","url":"https://www.themachinelearningpodcast.com/wrap-your-model-in-a-full-stack-application-in-an-afternoon-with-baseten","content_text":"Summary\nBuilding an ML model is getting easier than ever, but it is still a challenge to get that model in front of the people that you built it for. Baseten is a platform that helps you quickly generate a full stack application powered by your model. You can easily create a web interface and APIs powered by the model you created, or a pre-trained model from their library. In this episode Tuhin Srivastava, co-founder of Basten, explains how the platform empowers data scientists and ML engineers to get their work in production without having to negotiate for help from their application development colleagues.\nAnnouncements\n\nHello and welcome to the Machine Learning Podcast, the podcast about machine learning and how to bring it from idea to delivery.\nData powers machine learning, but poor data quality is the largest impediment to effective ML today. Galileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts. Get meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations. Galileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to themachinelearningpodcast.com/galileo and request a demo today!\nDo you wish you could use artificial intelligence to drive your business the way Big Tech does, but don’t have a money printer? Graft is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain. For more information on Graft or to schedule a demo, visit themachinelearningpodcast.com/graft today and tell them Tobias sent you.\nPredibase is a low-code ML platform without low-code limits. Built on top of our open source foundations of Ludwig and Horovod, our platform allows you to train state-of-the-art ML and deep learning models on your datasets at scale. Our platform works on text, images, tabular, audio and multi-modal data using our novel compositional model architecture. We allow users to operationalize models on top of the modern data stack, through REST and PQL – an extension of SQL that puts predictive power in the hands of data practitioners. Go to themachinelearningpodcast.com/predibase today to learn more and try it out!\nYour host is Tobias Macey and today I’m interviewing Tuhin Srivastava about Baseten, an ML Application Builder for data science and machine learning teams\n\nInterview\n\nIntroduction\nHow did you get involved in machine learning?\nCan you describe what Baseten is and the story behind it?\nWho are the target users for Baseten and what problems are you solving for them?\nWhat are some of the typical technical requirements for an application that is powered by a machine learning model?\n\nIn the absence of Baseten, what are some of the common utilities/patterns that teams might rely on?\n\n\nWhat kinds of challenges do teams run into when serving a model in the context of an application?\nThere are a number of projects that aim to reduce the overhead of turning a model into a usable product (e.g. Streamlit, Hex, etc.). What is your assessment of the current ecosystem for lowering the barrier to product development for ML and data science teams?\nCan you describe how the Baseten platform is designed?\n\nHow have the design and goals of the project changed or evolved since you started working on it?\nHow do you handle sandboxing of arbitrary user-managed code to ensure security and stability of the platform?\n\n\nHow did you approach the system design to allow for mapping application development paradigms into a structure that was accessible to ML professionals?\nCan you describe the workflow for building an ML powered application?\nWhat types of models do you support? (e.g. NLP, computer vision, timeseries, deep neural nets vs. linear regression, etc.)\n\nHow do the monitoring requirements shift for these different model types?\nWhat other challenges are presented by these different model types?\n\n\nWhat are the limitations in size/complexity/operational requirements that you have to impose to ensure a stable platform?\nWhat is the process for deploying model updates?\nFor organizations that are relying on Baseten as a prototyping platform, what are the options for taking a successful application and handing it off to a product team for further customization?\nWhat are the most interesting, innovative, or unexpected ways that you have seen Baseten used?\nWhat are the most interesting, unexpected, or challenging lessons that you have learned while working on Baseten?\nWhen is Baseten the wrong choice?\nWhat do you have planned for the future of Baseten?\n\nContact Info\n\n@tuhinone on Twitter\nLinkedIn\n\nParting Question\n\nFrom your perspective, what is the biggest barrier to adoption of machine learning today?\n\nClosing Announcements\n\nThank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.\nVisit the site to subscribe to the show, sign up for the mailing list, and read the show notes.\nIf you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.\nTo help other people find the show please leave a review on iTunes and tell your friends and co-workers\n\nLinks\n\nBaseten\nGumroad\nscikit-learn\nTensorflow\nKeras\nStreamlit\n\nPodcast.__init__ Episode\n\n\nRetool\nHex\n\nPodcast.__init__ Episode\n\n\nKubernetes\nReact Monaco\nHuggingface\nAirtable\nDall-E 2\nGPT-3\nWeights and Biases\n\nThe intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0\n\n\nSponsored By:Graft: ![Graft Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/LwNCPdKW.png)\r\nGraft™ is a cloud-native platform that aims to make the AI of the 1% accessible to the 99%. Wield the most advanced techniques for unlocking the value of data, including text, images, video, audio, and graphs. No machine learning skills required, no team to hire, and no infrastructure to build or maintain.\r\n\r\nFor more information on Graft or to schedule a demo go to [themachinelearningpodcast.com/graft](https://www.themachinelearningpodcast.com/graft) today! And tell them Tobias sent you.Galileo: ![Galileo Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/3Mm5horv.png)\r\nData powers machine learning, but poor data quality is the largest impediment to effective ML today.\r\n\r\nGalileo is a collaborative data bench for data scientists building Natural Language Processing (NLP) models to programmatically inspect, fix and track their data across the ML workflow (pre-training, post-training and post-production) – no more excel sheets or ad-hoc python scripts.\r\n\r\nGet meaningful gains in your model performance fast, dramatically reduce data labeling and procurement costs, while seeing 10x faster ML iterations.\r\n\r\nGalileo is offering listeners a free 30 day trial and a 30% discount on the product there after. This offer is available until Aug 31, so go to [themachinelearningpodcast.com/galileo](https://www.themachinelearningpodcast.com/galileo) and request a demo today!Predibase: ![Predibase Logo](https://files.fireside.fm/file/fireside-uploads/images/8/8fd5372e-f294-4685-ac03-f48dfa3c4d02/bbtLDXUq.png)\r\nPredibase’s founders saw the pain of getting ML models developed and in-production, taking up to a year even at leading tech companies like Uber, so they built internal platforms that drastically lowered the time-to-value and increased access. The key was taking a “declarative approach” to machine learning, which Piero Molino (CEO) introduced with Ludwig, an open source framework to create deep learning models with 8,400+ GitHub stars, more than 100 contributors, and thousands of monthly downloads. With Ludwig, tasks that took months-to-years were handed off to teams in thirty minutes and just six lines of human-readable configuration that can define an entire machine learning pipeline.\r\n\r\nNow with Predibase, we are bringing the power of declarative machine learning built on top of Ludwig to broader organizations with our enterprise platform. Like Infrastructure as Code simplified IT, Predibase’s machine learning (ML) platform allows users to focus on the “what” of their ML models rather than the “how”, breaking free of the usual limits in low-code systems and bringing down the time-to-value of ML projects from years to days. \r\n\r\n[Click here](https://themachinelearningpodcast.com/predibase) to learn more and try it for yourself!","content_html":"

Summary

\n

Building an ML model is getting easier than ever, but it is still a challenge to get that model in front of the people that you built it for. Baseten is a platform that helps you quickly generate a full stack application powered by your model. You can easily create a web interface and APIs powered by the model you created, or a pre-trained model from their library. In this episode Tuhin Srivastava, co-founder of Basten, explains how the platform empowers data scientists and ML engineers to get their work in production without having to negotiate for help from their application development colleagues.

\n

Announcements

\n\n

Interview

\n\n

Contact Info

\n\n

Parting Question

\n\n

Closing Announcements

\n\n

Links

\n\n

The intro and outro music is from Hitman’s Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0

\n
\n\n

\"\"

Sponsored By:

","summary":"An interview with Tuhin Srivastava about how the Baseten platform allows data scientists and ML engineers to build a full stack machine learning powered application by themselves in an afternoon","date_published":"2022-06-28T21:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/9396f6cf-c38f-405a-8339-bd0c4d7bf73b.mp3","mime_type":"audio/mpeg","size_in_bytes":34147585,"duration_in_seconds":2786}]},{"id":"podlove-2022-06-03t12:12:04+00:00-486b841f7253ee2","title":"Introducing The Show","url":"https://www.themachinelearningpodcast.com/introducing-the-show-episode-0","content_text":"Hello, and welcome to the Machine Learning Podcast. I’m your host, Tobias Macey. You might know me from the Data Engineering Podcast or the Python Podcast.__init__. If you work with machine learning and AI, or you’re curious about it and want to learn more, then this show is for you. We’ll go beyond the esoteric research and flashy headlines and find out how machine learning is making an impact on the world and creating value for business. Along the way we’ll be joined by the researchers, engineers, and entrepreneurs who are shaping the industry. So go to themachinelearningpodcast.com today to subscribe and stay informed on how ML/AI are being used, how it works, and how to go from idea to production.\n\n\n","content_html":"

Hello, and welcome to the Machine Learning Podcast. I’m your host, Tobias Macey. You might know me from the Data Engineering Podcast or the Python Podcast.__init__. If you work with machine learning and AI, or you’re curious about it and want to learn more, then this show is for you. We’ll go beyond the esoteric research and flashy headlines and find out how machine learning is making an impact on the world and creating value for business. Along the way we’ll be joined by the researchers, engineers, and entrepreneurs who are shaping the industry. So go to themachinelearningpodcast.com today to subscribe and stay informed on how ML/AI are being used, how it works, and how to go from idea to production.

\n
\n\n

\"\"

","summary":"Introducing the new podcast about how to go from idea to production with machine learning","date_published":"2022-06-03T08:00:00.000-04:00","attachments":[{"url":"https://op3.dev/e/dts.podtrac.com/redirect.mp3/aphid.fireside.fm/d/1437767933/8fd5372e-f294-4685-ac03-f48dfa3c4d02/42f5bc38-14a5-403f-b9df-3b7814f0f757.mp3","mime_type":"audio/mpeg","size_in_bytes":976269,"duration_in_seconds":71}]}]}