Episodes

  • Evolving Data Infrastructure for the AI Era: AWS, Meta, and Beyond with Roy Ben-Alta
    Nov 21 2024

    In this episode, we chat with Roy Ben-Alta, co-founder of Oakminer AI and former director at Meta AI Research, about his fascinating journey through the evolution of data infrastructure and AI. We explore his early days at AWS when cloud adoption was still controversial, his experience building large language models at Meta, and the challenges of training and deploying AI systems at scale. Roy shares valuable insights about the future of data warehouses, the emergence of knowledge-centric systems, and the critical role of data engineering in AI. We'll also hear his practical advice on building AI companies today, including thoughts on model evaluation frameworks, vendor lock-in, and the eternal "build vs. buy" decision. Drawing from his extensive experience across Amazon, Meta, and now as a founder, Roy offers a unique perspective on how AI is transforming traditional data infrastructure and what it means for the future of enterprise software.

    Chapters

    00:00 Introduction to Roy Benalta and AI Background
    04:07 Warren Buffett Experience and MBA Insights
    06:45 Lessons from Amazon and Meta Leadership
    09:15 Early Days of AWS and Cloud Adoption
    12:12 Redshift vs. Snowflake: A Data Warehouse Perspective
    14:49 Navigating Complex Data Systems in Organizations
    31:21 The Future of Personalized Software Solutions
    32:19 Building Large Language Models at Meta
    39:27 Evolution of Data Platforms and Infrastructure
    50:50 Engineering Knowledge and LLMs
    58:27 Build vs. Buy: Strategic Decisions for Startups

    Show more Show less
    1 hr and 3 mins
  • From Functions to Full Applications: How Serverless Evolved Beyond AWS Lambda with Nitzan Shapira
    Nov 6 2024

    In this episode, we chat with Nitzan Shapira, co-founder and former CEO of Epsagon, which was acquired by Cisco in 2021. We explore Nitzan's journey from working in cybersecurity to building an observability platform for cloud applications, particularly focused on serverless architectures. We learn about the early days of serverless adoption, the challenges in making observability tools developer-friendly, and why distributed tracing was a key differentiator for Epsagon. We discuss the evolution of observability tools, the future impact of AI on both observability and software development, and the changing landscape of serverless computing. Finally, we hear Nitzan's current perspective on enterprise AI adoption from his role at Cisco, where he helps evaluate and build new AI-focused business lines.

    03:17 Transition from Security to Observability
    09:52 Exploring Ideas and Choosing Serverless
    16:43 Adoption of Distributed Tracing
    20:54 The Future of Observability
    25:26 Building a Product that Developers Love
    31:03 Challenges in Observability and Data Costs
    32:47 The Excitement and Evolution of Serverless
    35:44 Serverless as a Horizontal Platform
    37:15 The Future of Serverless and No-Code/Low-Code Tools
    38:15 Technical Limits and the Future of Serverless
    40:38 Navigating Near-Death Moments and Go-to-Market Challenges
    48:36 Cisco's Gen .AI Ecosystem and New Business Lines
    50:25 The State of the AI Ecosystem and Enterprise Adoption
    53:54 Using AI to Enhance Engineering and Product Development
    55:02 Using AI in Go-to-Market Strategies

    Show more Show less
    58 mins
  • From GPU Compilers to architecting Kubernetes: A Conversation with Brian Grant
    Oct 22 2024

    From GPU computing pioneer to Kubernetes architect, Brian Grant takes us on a fascinating journey through his career at the forefront of systems engineering. In this episode, we explore his early work on GPU compilers in the pre-CUDA era, where he tackled unique challenges in high-performance computing when graphics cards weren't yet designed for general computation. Brian then shares insights from his time at Google, where he helped develop Borg and later became the original lead architect of Kubernetes. He explains key architectural decisions that shaped Kubernetes, from its extensible resource model to its approach to service discovery, and why they chose to create a rich set of abstractions rather than a minimal interface. The conversation concludes with Brian's thoughts on standardization challenges in cloud infrastructure and his vision for moving beyond infrastructure as code, offering valuable perspective on both the history and future of distributed systems.

    Links:
    Brian Grant LI

    Chapters

    00:00 Introduction and Background
    03:11 Early Work in High-Performance Computing
    06:21 Challenges of Building Compilers for GPUs
    13:14 Influential Innovations in Compilers
    31:46 The Future of Compilers
    33:11 The Rise of Niche Programming Languages
    34:01 The Evolution of Google's Borg and Kubernetes
    39:06 Challenges of Managing Applications in a Dynamically Scheduled Environment
    48:12 The Need for Standardization in Application Interfaces and Management Systems
    01:00:55 Driving Network Effects and Creating Cohesive Ecosystems

    Click here to view the episode transcript.

    Show more Show less
    1 hr and 2 mins
  • Proving Code Correctness: FizzBee and the Future of Formal Methods in Software Design with FizzBee's creator JP
    Oct 8 2024

    In this episode, we chat with JP, creator of FizzBee, about formal methods and their application in software engineering. We explore the differences between coding and engineering, discussing how formal methods can improve system design and reliability. JP shares insights from his time at Google and explains why tools like FizzBee are crucial for distributed systems. We delve into the challenges of adopting formal methods in industry, the potential of FizzBee to make these techniques more accessible, and how it compares to other tools like TLA+. Finally, we discuss the future of software development, including the role of LLMs in code generation and the ongoing importance of human engineers in system design.

    Links
    FizzBee
    FizzBee Github Repo
    FizzBee Blog

    Chapters
    00:00 Introduction and Overview
    02:42 JP's Experience at Google and the Growth of the Company
    04:51 The Difference Between Engineers and Coders
    06:41 The Importance of Rigor and Quality in Engineering
    10:08 The Limitations of QA and the Need for Formal Methods
    14:00 The Role of Best Practices in Software Engineering
    14:56 Design Specification Languages for System Correctness
    21:43 The Applicability of Formal Methods in Distributed Systems
    31:20 Getting Started with FizzBee: A Practical Example
    36:06 Common Assumptions and Misconceptions in Distributed Systems
    43:23 The Role of FizzBee in the Design Phase
    48:04 The Future of FizzBee: LLMs and Code Generation
    58:20 Getting Started with FizzBee: Tutorials and Online Playground


    Click here to view the episode transcript.

    Show more Show less
    1 hr and 1 min
  • MLOps Evolution: Data, Experiments, and AI with Dean Pleban from DagsHub
    Sep 27 2024

    In this episode, we chat with Dean Pleban, CEO of DagsHub, about machine learning operations. We explore the differences between DevOps and MLOps, focusing on data management and experiment tracking. Dean shares insights on versioning various components in ML projects and discusses the importance of user experience in MLOps tools. We also touch on DagsHub's integration of AI in their product and Dean's vision for the future of AI and machine learning in industry.

    Links

    DagsHub
    The MLOps Podcast
    Dean on LI

    Chapters

    00:00 Introduction and Background
    03:03 Challenges of Managing Machine Learning Projects
    10:00 The Concept of Experiments in Machine Learning
    12:51 Data Curation and Validation for High-Quality Data
    27:07 Connecting the Components of Machine Learning Projects with DAGS Hub
    29:12 The Importance of Data and Clear Interfaces
    43:29 Incorporating Machine Learning into DAGsHub
    51:27 The Future of ML and AI

    Show more Show less
    54 mins
  • How Denormalized is Building ‘DuckDB for Streaming’ with Apache DataFusion
    Sep 13 2024

    In this episode, Kostas and Nitay are joined by Amey Chaugule and Matt Green, co-founders of Denormalized. They delve into how Denormalized is building an embedded stream processing engine—think “DuckDB for streaming”—to simplify real-time data workloads. Drawing from their extensive backgrounds at companies like Uber, Lyft, Stripe, and Coinbase. Amey and Matt discuss the challenges of existing stream processing systems like Spark, Flink, and Kafka. They explain how their approach leverages Apache DataFusion, to create a single-node solution that reduces the complexities inherent in distributed systems.


    The conversation explores topics such as developer experience, fault tolerance, state management, and the future of stream processing interfaces. Whether you’re a data engineer, application developer, or simply interested in the evolution of real-time data infrastructure, this episode offers valuable insights into making stream processing more accessible and efficient.


    Contacts & Links
    Amey Chaugule
    Matt Green
    Denormalized
    Denormalized Github Repo

    Chapters
    00:00 Introduction and Background
    12:03 Building an Embedded Stream Processing Engine
    18:39 The Need for Stream Processing in the Current Landscape
    22:45 Interfaces for Interacting with Stream Processing Systems
    26:58 The Target Persona for Stream Processing Systems
    31:23 Simplifying Stream Processing Workloads and State Management
    34:50 State and Buffer Management
    37:03 Distributed Computing vs. Single-Node Systems
    42:28 Cost Savings with Single-Node Systems
    47:04 The Power and Extensibility of Data Fusion
    55:26 Integrating Data Store with Data Fusion
    57:02 The Future of Streaming Systems
    01:00:18 intro-outro-fade.mp3

    Click here to view the episode transcript.


    Show more Show less
    1 hr and 2 mins
  • Unifying structured and unstructured data for AI: Rethinking ML infrastructure with Nikhil Simha and Varant Zanoyan
    Aug 30 2024

    In this episode, we dive deep into the future of data infrastructure for AI and ML with Nikhil Simha and Varant Zanoyan, two seasoned engineers from Airbnb and Facebook. Nikhil and Varant share their journey from building real-time data systems and ML infrastructure at tech giants to launching their own venture.

    The conversation explores the intricacies of designing developer-friendly APIs, the complexities of handling both batch and streaming data, and the delicate balance between customer needs and product vision in a startup environment.

    Contacts & Links

    Nikhil Simha
    Varant Zanoyan
    Chronon project

    Chapters

    00:00 Introduction and Past Experiences
    04:38 The Challenges of Building Data Infrastructure for Machine Learning
    08:01 Merging Real-Time Data Processing with Machine Learning
    14:08 Backfilling New Features in Data Infrastructure
    20:57 Defining Failure in Data Infrastructure
    26:45 The Choice Between SQL and Data Frame APIs
    34:31 The Vision for Future Improvements
    38:17 Introduction to Chrono and Open Source
    43:29 The Future of Chrono: New Computation Paradigms
    48:38 Balancing Customer Needs and Vision
    57:21 Engaging with Customers and the Open Source Community
    01:01:26 Potential Use Cases and Future Directions

    Click here to view the episode transcript.

    Show more Show less
    1 hr and 2 mins
  • Stream processing, LSMs and leaky abstractions with Chris Riccomini
    Aug 23 2024

    In this episode, we chat with Chris Riccomini about the evolution of stream processing and the challenges in building applications on streaming systems. We also chat about leaky abstractions, good and bad API designs, what Chris loves and hates about Rust and finally about his exciting new project that involves object storage and LSMs.

    Connect with Chris at:
    LinkedIn
    X
    Blog
    Materialized View Newsletter - His newsletter
    The missing README - His book
    SlateDB - His latest OSS Project

    Chapters
    00:00 Introduction and Background

    04:05 The State of Stream Processing Today

    08:53 The Limitations of SQL in Streaming Systems

    14:00 Prioritizing the Developer Experience in Stream Processing

    18:15 Improving the Usability of Streaming Systems

    27:54 The Potential of State Machine Programming in Complex Systems

    32:41 The Power of Rust: Compiling and Language Bindings

    34:06 The Shift from Sidecar to Embedded Libraries Driven by Rust

    35:49 Building an LSM on Object Storage: Cost-Effective State Management

    39:47 The Unbundling and Composable Nature of Databases

    47:30 The Future of Data Systems: More Companies and Focus on Metadata


    Click here to view the episode transcript.

    Show more Show less
    53 mins