• AI Agents for Data Analysis with Shreya Shankar - #703
    Oct 1 2024
    Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL, a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and complex document analysis tasks. We explore how DocETL's optimizer architecture works, the intricacies of building agentic systems for data processing, the current landscape of benchmarks for data processing tasks, how these differ from reasoning-based benchmarks, and the need for robust evaluation methods for human-in-the-loop LLM workflows. Additionally, Shreya shares real-world applications of DocETL, the importance of effective validation prompts, and building robust and fault-tolerant agentic systems. Lastly, we cover the need for benchmarks tailored to LLM-powered data processing tasks and the future directions for DocETL. The complete show notes for this episode can be found at https://twimlai.com/go/703.
    Show more Show less
    48 mins
  • Stealing Part of a Production Language Model with Nicholas Carlini - #702
    Sep 23 2024
    Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part of a production language model.” We dig into this work, which demonstrated the ability to successfully steal the last layer of production language models including ChatGPT and PaLM-2. Nicholas shares the current landscape of AI security research in the age of LLMs, the implications of model stealing, ethical concerns surrounding model privacy, how the attack works, and the significance of the embedding layer in language models. We also discuss the remediation strategies implemented by OpenAI and Google, and the future directions in the field of AI security. Plus, we also cover his other ICML 2024 best paper, “Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining,” which questions the use and promotion of differential privacy in conjunction with pre-trained models. The complete show notes for this episode can be found at https://twimlai.com/go/702.
    Show more Show less
    1 hr and 4 mins
  • Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison - #701
    Sep 16 2024
    Today, we're joined by Simon Willison, independent researcher and creator of Datasette to discuss the many ways software developers and engineers can take advantage of large language models (LLMs) to boost their productivity. We dig into Simon’s own workflows and how he uses popular models like ChatGPT and Anthropic’s Claude to write and test hundreds of lines of code while out walking his dog. We review Simon’s favorite prompting and debugging techniques, his strategies for sidestepping the limitations of contemporary models, how he uses Claude’s Artifacts feature for rapid prototyping, his thoughts on the use and impact of vision models, the role he sees for open source models and local LLMs, and much more. The complete show notes for this episode can be found at https://twimlai.com/go/701.
    Show more Show less
    1 hr and 14 mins
  • Automated Design of Agentic Systems with Shengran Hu - #700
    Sep 3 2024
    Today, we're joined by Shengran Hu, a PhD student at the University of British Columbia, to discuss Automated Design of Agentic Systems (ADAS), an approach focused on automatically creating agentic system designs. We explore the spectrum of agentic behaviors, the motivation for learning all aspects of agentic system design, the key components of the ADAS approach, and how it uses LLMs to design novel agent architectures in code. We also cover the iterative process of ADAS, its potential to shed light on the behavior of foundation models, the higher-level meta-behaviors that emerge in agentic systems, and how ADAS uncovers novel design patterns through emergent behaviors, particularly in complex tasks like the ARC challenge. Finally, we touch on the practical applications of ADAS and its potential use in system optimization for real-world tasks. The complete show notes for this episode can be found at https://twimlai.com/go/700.
    Show more Show less
    1 hr
  • The EU AI Act and Mitigating Bias in Automated Decisioning with Peter van der Putten - #699
    Aug 27 2024
    Today, we're joined by Peter van der Putten, director of the AI Lab at Pega and assistant professor of AI at Leiden University. We discuss the newly adopted European AI Act and the challenges of applying academic fairness metrics in real-world AI applications. We dig into the key ethical principles behind the Act, its broad definition of AI, and how it categorizes various AI risks. We also discuss the practical challenges of implementing fairness and bias metrics in real-world scenarios, and the importance of a risk-based approach in regulating AI systems. Finally, we cover how the EU AI Act might influence global practices, similar to the GDPR's effect on data privacy, and explore strategies for closing bias gaps in real-world automated decision-making. The complete show notes for this episode can be found at https://twimlai.com/go/699.
    Show more Show less
    46 mins
  • The Building Blocks of Agentic Systems with Harrison Chase - #698
    Aug 19 2024
    Today, we're joined by Harrison Chase, co-founder and CEO of LangChain to discuss LLM frameworks, agentic systems, RAG, evaluation, and more. We dig into the elements of a modern LLM framework, including the most productive developer experiences and appropriate levels of abstraction. We dive into agents and agentic systems as well, covering the “spectrum of agenticness,” cognitive architectures, and real-world applications. We explore key challenges in deploying agentic systems, and the importance of agentic architectures as a means of communication in system design and operation. Additionally, we review evolving use cases for RAG, and the role of observability, testing, and evaluation tools in moving LLM applications from prototype to production. Lastly, Harrison shares his hot takes on prompting, multi-modal models, and more! The complete show notes for this episode can be found at https://twimlai.com/go/698.
    Show more Show less
    59 mins
  • Simplifying On-Device AI for Developers with Siddhika Nevrekar - #697
    Aug 12 2024
    Today, we're joined by Siddhika Nevrekar, AI Hub head at Qualcomm Technologies, to discuss on-device AI and how to make it easier for developers to take advantage of device capabilities. We unpack the motivations for AI engineers to move model inference from the cloud to local devices, and explore the challenges associated with on-device AI. We dig into the role of hardware solutions, from powerful system-on-chips (SoC) to neural processors, the importance of collaboration between community runtimes like ONNX and TFLite and chip manufacturers, the unique challenges of IoT and autonomous vehicles, and the key metrics developers should focus on to ensure optimal on-device performance. Finally, Siddhika introduces Qualcomm's AI Hub, a platform developed to simplify the process of testing and optimizing AI models across different devices. The complete show notes for this episode can be found at https://twimlai.com/go/697.
    Show more Show less
    47 mins
  • Genie: Generative Interactive Environments with Ashley Edwards - #696
    Aug 5 2024
    Today, we're joined by Ashley Edwards, a member of technical staff at Runway, to discuss Genie: Generative Interactive Environments, a system for creating ‘playable’ video environments for training deep reinforcement learning (RL) agents at scale in a completely unsupervised manner. We explore the motivations behind Genie, the challenges of data acquisition for RL, and Genie’s capability to learn world models from videos without explicit action data, enabling seamless interaction and frame prediction. Ashley walks us through Genie’s core components—the latent action model, video tokenizer, and dynamics model—and explains how these elements collaborate to predict future frames in video sequences. We discuss the model architecture, training strategies, benchmarks used, as well as the application of spatiotemporal transformers and the MaskGIT techniques used for efficient token prediction and representation. Finally, we touched on Genie’s practical implications, its comparison to other video generation models like “Sora,” and potential future directions in video generation and diffusion models. The complete show notes for this episode can be found at https://twimlai.com/go/696.
    Show more Show less
    47 mins