• 📆 ThursdAI - Nov 21 - The fight for the LLM throne, OSS SOTA from AllenAI, Flux new tools, Deepseek R1 reasoning & more AI news
    Nov 22 2024
    Hey folks, Alex here, and oof what a 🔥🔥🔥 show we had today! I got to use my new breaking news button 3 times this show! And not only that, some of you may know that one of the absolutely biggest pleasures as a host, is to feature the folks who actually make the news on the show!And now that we're in video format, you actually get to see who they are! So this week I was honored to welcome back our friend and co-host Junyang Lin, a Dev Lead from the Alibaba Qwen team, who came back after launching the incredible Qwen Coder 2.5, and Qwen 2.5 Turbo with 1M context.We also had breaking news on the show that AI2 (Allen Institute for AI) has fully released SOTA LLama post-trained models, and I was very lucky to get the core contributor on the paper, Nathan Lambert to join us live and tell us all about this amazing open source effort! You don't want to miss this conversation!Lastly, we chatted with the CEO of StackBlitz, Eric Simons, about the absolutely incredible lightning in the bottle success of their latest bolt.new product, how it opens a new category of code generator related tools.00:00 Introduction and Welcome00:58 Meet the Hosts and Guests02:28 TLDR Overview03:21 Tl;DR04:10 Big Companies and APIs07:47 Agent News and Announcements08:05 Voice and Audio Updates08:48 AR, Art, and Diffusion11:02 Deep Dive into Mistral and Pixtral29:28 Interview with Nathan Lambert from AI230:23 Live Reaction to Tulu 3 Release30:50 Deep Dive into Tulu 3 Features32:45 Open Source Commitment and Community Impact33:13 Exploring the Released Artifacts33:55 Detailed Breakdown of Datasets and Models37:03 Motivation Behind Open Source38:02 Q&A Session with the Community38:52 Summarizing Key Insights and Future Directions40:15 Discussion on Long Context Understanding41:52 Closing Remarks and Acknowledgements44:38 Transition to Big Companies and APIs45:03 Weights & Biases: This Week's Buzz01:02:50 Mistral's New Features and Upgrades01:07:00 Introduction to DeepSeek and the Whale Giant01:07:44 DeepSeek's Technological Achievements01:08:02 Open Source Models and API Announcement01:09:32 DeepSeek's Reasoning Capabilities01:12:07 Scaling Laws and Future Predictions01:14:13 Interview with Eric from Bolt01:14:41 Breaking News: Gemini Experimental01:17:26 Interview with Eric Simons - CEO @ Stackblitz01:19:39 Live Demo of Bolt's Capabilities01:36:17 Black Forest Labs AI Art Tools01:40:45 Conclusion and Final ThoughtsAs always, the show notes and TL;DR with all the links I mentioned on the show and the full news roundup below the main new recap 👇Google & OpenAI fighting for the LMArena crown 👑I wanted to open with this, as last week I reported that Gemini Exp 1114 has taken over #1 in the LMArena, in less than a week, we saw a new ChatGPT release, called GPT-4o-2024-11-20 reclaim the arena #1 spot!Focusing specifically on creating writing, this new model, that's now deployed on chat.com and in the API, is definitely more creative according to many folks who've tried it, with OpenAI employees saying "expect qualitative improvements with more natural and engaging writing, thoroughness and readability" and indeed that's what my feed was reporting as well.I also wanted to mention here, that we've seen this happen once before, last time Gemini peaked at the LMArena, it took less than a week for OpenAI to release and test a model that beat it.But not this time, this time Google came prepared with an answer!Just as we were wrapping up the show (again, Logan apparently loves dropping things at the end of ThursdAI), we got breaking news that there is YET another experimental model from Google, called Gemini Exp 1121, and apparently, it reclaims the stolen #1 position, that chatGPT reclaimed from Gemini... yesterday! Or at least joins it at #1LMArena Fatigue?Many folks in my DMs are getting a bit frustrated with these marketing tactics, not only the fact that we're getting experimental models faster than we can test them, but also with the fact that if you think about it, this was probably a calculated move by Google. Release a very powerful checkpoint, knowing that this will trigger a response from OpenAI, but don't release your most powerful one. OpenAI predictably releases their own "ready to go" checkpoint to show they are ahead, then folks at Google wait and release what they wanted to release in the first place.The other frustration point is, the over-indexing of the major labs on the LMArena human metrics, as the closest approximation for "best". For example, here's some analysis from Artificial Analysis showing that the while the latest ChatGPT is indeed better at creative writing (and #1 in the Arena, where humans vote answers against each other), it's gotten actively worse at MATH and coding from the August version (which could be a result of being a distilled much smaller version) .In summary, maybe the LMArena is no longer 1 arena is all you need, but the competition at the TOP scores of the Arena has never been ...
    Show more Show less
    1 hr and 45 mins
  • 📆 ThursdAI - Nov 14 - Qwen 2.5 Coder, No Walls, Gemini 1114 👑 LLM, ChatGPT OS integrations & more AI news
    Nov 15 2024
    This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real time at 59:32)00:00 Welcome to ThursdAI00:25 Meet the Hosts02:38 Show Format and Community03:18 TLDR Overview04:01 Open Source Highlights13:31 Qwen Coder 2.5 Release14:00 Speculative Decoding and Model Performance22:18 Interactive Demos and Artifacts28:20 Training Insights and Future Prospects33:54 Breaking News: Nexus Flow36:23 Exploring Athene v2 Agent Capabilities36:48 Understanding ArenaHard and Benchmarking40:55 Scaling and Limitations in AI Models43:04 Nexus Flow and Scaling Debate49:00 Open Source LLMs and New Releases52:29 FrontierMath Benchmark and Quantization Challenges58:50 Gemini Experimental 1114 Release and Performance01:11:28 LLM Observability with Weave01:14:55 Introduction to Tracing and Evaluations01:15:50 Weave API Toolkit Overview01:16:08 Buzz Corner: Weights & Biases01:16:18 Nous Forge Reasoning API01:26:39 Breaking News: OpenAI's New MacOS Features01:27:41 Live Demo: ChatGPT Integration with VS Code01:34:28 Ultravox: Real-Time AI Conversations01:42:03 Tilde Research and Stargazer Tool01:46:12 Conclusion and Final ThoughtsThis week also, there was a debate online, whether deep learning (and scale is all you need) has hit a wall, with folks like Ilya Sutskever being cited by publications claiming it has, folks like Yann LeCoon calling "I told you so". TL;DR? multiple huge breakthroughs later, and both Oriol from DeepMind and Sam Altman are saying "what wall?" and Heiner from X.ai saying "skill issue", there is no walls in sight, despite some tech journalism love to pretend there is. Also, what happened to Yann? 😵‍💫Ok, back to our scheduled programming, here's the TL;DR, afterwhich, a breakdown of the most important things about today's update, and as always, I encourage you to watch / listen to the show, as we cover way more than I summarize here 🙂TL;DR and Show Notes:* Open Source LLMs* Qwen Coder 2.5 32B (+5 others) - Sonnet @ home (HF, Blog, Tech Report)* The End of Quantization? (X, Original Thread)* Epoch : FrontierMath new benchmark for advanced MATH reasoning in AI (Blog)* Common Corpus: Largest multilingual 2T token dataset (blog)* NexusFlow - Athena v2 - open model suite (X, Blog, HF)* Big CO LLMs + APIs* Gemini 1114 is new king LLM #1 LMArena (X)* Nous Forge Reasoning API - beta (Blog, X)* Reuters reports "AI is hitting a wall" and it's becoming a meme (Article)* Cursor acq. SuperMaven (X)* This Weeks Buzz* Weave JS/TS support is here 🙌* Voice & Audio* Fixie releases UltraVox SOTA (Demo, HF, API)* Suno v4 is coming and it's bonkers amazing (Alex Song, SOTA Jingle)* Tools demoed* Qwen artifacts - HF Demo* Tilde Galaxy - Interp Tool This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
    Show more Show less
    1 hr and 49 mins
  • 📆 ThursdAI - Nov 7 - Video version, full o1 was given and taken away, Anthropic price hike-u, halloween 💀 recap & more AI news
    Nov 8 2024
    👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr).I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams submitted incredible projects 👏 You can follow some of these hereI then decided to stick around and record the show from SF, and finally pulled the plug and asked for some budget, and I present, the first ThursdAI, recorded from the newly minted W&B Podcast studio at our office in SF 🎉This isn't the only first, today also, for the first time, all of the regular co-hosts of ThursdAI, met on video for the first time, after over a year of hanging out weekly, we've finally made the switch to video, and you know what? Given how good AI podcasts are getting, we may have to stick around with this video thing! We played one such clip from a new model called hertz-dev, which is a <10B model for full duplex audio.Given that today's episode is a video podcast, I would love for you to see it, so here's the timestamps for the chapters, which will be followed by the TL;DR and show notes in raw format. I would love to hear from folks who read the longer form style newsletters, do you miss them? Should I bring them back? Please leave me a comment 🙏 (I may send you a survey)This was a generally slow week (for AI!! not for... ehrm other stuff) and it was a fun podcast! Leave me a comment about what you think about this new format.Chapter Timestamps00:00 Introduction and Agenda Overview00:15 Open Source LLMs: Small Models01:25 Open Source LLMs: Large Models02:22 Big Companies and LLM Announcements04:47 Hackathon Recap and Community Highlights18:46 Technical Deep Dive: HertzDev and FishSpeech33:11 Human in the Loop: AI Agents36:24 Augmented Reality Lab Assistant36:53 Hackathon Highlights and Community Vibes37:17 Chef Puppet and Meta Ray Bans Raffle37:46 Introducing Fester the Skeleton38:37 Fester's Performance and Community Reactions39:35 Technical Insights and Project Details42:42 Big Companies API Updates43:17 Haiku 3.5: Performance and Pricing43:44 Comparing Haiku and Sonnet Models51:32 XAI Grok: New Features and Pricing57:23 OpenAI's O1 Model: Leaks and Expectations01:08:42 Transformer ASIC: The Future of AI Hardware01:13:18 The Future of Training and Inference Chips01:13:52 Oasis Demo and Etched AI Controversy01:14:37 Nisten's Skepticism on Etched AI01:19:15 Human Layer Introduction with Dex01:19:24 Building and Managing AI Agents01:20:54 Challenges and Innovations in AI Agent Development01:21:28 Human Layer's Vision and Future01:36:34 Recap and Closing RemarksThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Show Notes and Links:* Interview* Dexter Horthy (X) from HumanLayer* Open Source LLMs* SmolLM2: the new, best, and open 1B-parameter language mode (X)* Meta released MobileLLM (125M, 350M, 600M, 1B) (HF)* Tencent Hunyuan Large - 389B X 52B (Active) MoE (X, HF, Paper)* Big CO LLMs + APIs* OpenAI buys and opens chat.com* Anthropic releases Claude Haiku 3.5 via API (X, Blog)* OpenAI drops o1 full - and pulls it back (but not before it got Jailbroken)* X.ai now offers $25/mo free of Grok API credits (X, Platform)* Etched announces Sonu - first Transformer ASIC - 500K tok/s (etched)* PPXL is not valued at 9B lol* This weeks Buzz* Recap of SF Hackathon w/ AI Tinkerers (X)* Fester the Halloween Toy aka Project Halloweave videos from trick or treating (X, Writeup)* Voice & Audio* Hertz-dev - 8.5B conversation audio gen (X, Blog )* Fish Agent v0.1 3B - Speech to Speech model (HF, Demo)* AI Art & Diffusion & 3D* FLUX 1.1 [pro] is how HD - 4x resolution (X, blog)Full Transcription for convenience below: This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
    Show more Show less
    1 hr and 38 mins
  • 📆 ThursdAI - Spooky Halloween edition with Video!
    Nov 1 2024
    Hey everyone, Happy Halloween! Alex here, coming to you live from my mad scientist lair! For the first ever, live video stream of ThursdAI, I dressed up as a mad scientist and had my co-host, Fester the AI powered Skeleton join me (as well as my usual cohosts haha) in a very energetic and hopefully entertaining video stream! Since it's Halloween today, Fester (and I) have a very busy schedule, so no super length ThursdAI news-letter today, as we're still not in the realm of Gemini being able to write a decent draft that takes everything we talked about and cover all the breaking news, I'm afraid I will have to wish you a Happy Halloween and ask that you watch/listen to the episode. The TL;DR and show links from today, don't cover all the breaking news but the major things we saw today (and caught live on the show as Breaking News) were, ChatGPT now has search, Gemini has grounded search as well (seems like the release something before Google announces it streak from OpenAI continues). Here's a quick trailer of the major things that happened: This weeks buzz - Halloween AI toy with WeaveIn this weeks buzz, my long awaited Halloween project is finally live and operational! I've posted a public Weave dashboard here and the code (that you can run on your mac!) hereReally looking forward to see all the amazing costumers the kiddos come up with and how Gemini will be able to respond to them, follow along! ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Ok and finally my raw TL;DR notes and links for this week. Happy halloween everyone, I'm running off to spook the kiddos (and of course record and post about it!)ThursdAI - Oct 31 - TL;DRTL;DR of all topics covered:* Open Source LLMs:* Microsoft's OmniParser: SOTA UI parsing (MIT Licensed) 𝕏* Groundbreaking model for web automation (MIT license).* State-of-the-art UI parsing and understanding.* Outperforms GPT-4V in parsing web UI.* Designed for web automation tasks.* Can be integrated into various development workflows.* ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech 𝕏* End-to-end voice model for Chinese and English speech.* Open-sourced and readily available.* Focuses on direct speech understanding and generation.* Potential applications in various speech-related tasks.* Meta releases LongVU: Video LM for long videos 𝕏* Handles long videos with impressive performance.* Uses DINOv2 for downsampling, eliminating redundant scenes.* Fuses features using DINOv2 and SigLIP.* Select tokens are passed to Qwen2/Llama-3.2-3B.* Demo and model are available on HuggingFace.* Potential for significant advancements in video understanding.* OpenAI new factuality benchmark (Blog, Github)* Introducing SimpleQA: new factuality benchmark* Goal: high correctness, diversity, challenging for frontier models* Question Curation: AI trainers, verified by second trainer* Quality Assurance: 3% inherent error rate* Topic Diversity: wide range of topics* Grading Methodology: "correct", "incorrect", "not attempted"* Model Comparison: smaller models answer fewer correctly* Calibration Measurement: larger models more calibrated* Limitations: only for short, fact-seeking queries* Conclusion: drive research on trustworthy AI* Big CO LLMs + APIs:* ChatGPT now has Search! (X)* Grounded search results in browsing the web* Still hallucinates* Reincarnation of Search GPT inside ChatGPT* Apple Intelligence Launch: Image features for iOS 18.2 [𝕏]( Link not provided in source material)* Officially launched for developers in iOS 18.2.* Includes Image Playground and Gen Moji.* Aims to enhance image creation and manipulation on iPhones.* GitHub Universe AI News: Co-pilot expands, new Spark tool 𝕏* GitHub Co-pilot now supports Claude, Gemini, and OpenAI models.* GitHub Spark: Create micro-apps using natural language.* Expanding the capabilities of AI-powered coding tools.* Copilot now supports multi-file edits in VS Code, similar to Cursor, and faster code reviews.* GitHub Copilot extensions are planned for release in 2025.* Grok Vision: Image understanding now in Grok 𝕏* Finally has vision capabilities (currently via 𝕏, API coming soon).* Can now understand and explain images, even jokes.* Early version, with rapid improvements expected.* OpenAI advanced voice mode updates (X)* 70% cheaper in input tokens because of automatic caching (X)* Advanced voice mode is now on desktop app* Claude this morning - new mac / pc App* This week's Buzz:* My AI Halloween toy skeleton is greeting kids right now (and is reporting to Weave dashboard)* Vision & Video:* Meta's LongVU: Video LM for long videos 𝕏 (see Open Source LLMs for details)* Grok Vision on 𝕏: 𝕏 (see Big CO LLMs + APIs for details)* Voice & Audio:* MaskGCT: New SoTA Text-to-Speech 𝕏* New open-source state-of-the-art text-to-speech model.* Zero-shot voice cloning, emotional TTS, ...
    Show more Show less
    1 hr and 49 mins
  • 📅 ThursdAI - Oct 24 - Claude 3.5 controls your PC?! Talking AIs with 🦾, Multimodal Weave, Video Models mania + more AI news from this 🔥 week.
    Oct 25 2024
    Hey all, Alex here, coming to you from the (surprisingly) sunny Seattle, with just a mind-boggling week of releases. Really, just on Tuesday there was so much news already! I had to post a recap thread, something I do usually after I finish ThursdAI! From Anthropic reclaiming close-second sometimes-first AI lab position + giving Claude the wheel in the form of computer use powers, to more than 3 AI video generation updates with open source ones, to Apple updating Apple Intelligence beta, it's honestly been very hard to keep up, and again, this is literally part of my job! But once again I'm glad that we were able to cover this in ~2hrs, including multiple interviews with returning co-hosts ( Simon Willison came back, Killian came back) so definitely if you're only a reader at this point, listen to the show! Ok as always (recently) the TL;DR and show notes at the bottom (I'm trying to get you to scroll through ha, is it working?) so grab a bucket of popcorn, let's dive in 👇 ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Claude's Big Week: Computer Control, Code Wizardry, and the Mysterious Case of the Missing OpusAnthropic dominated the headlines this week with a flurry of updates and announcements. Let's start with the new Claude Sonnet 3.5 (really, they didn't update the version number, it's still 3.5 tho a different API model) Claude Sonnet 3.5: Coding Prodigy or Benchmark Buster?The new Sonnet model shows impressive results on coding benchmarks, surpassing even OpenAI's O1 preview on some. "It absolutely crushes coding benchmarks like Aider and Swe-bench verified," I exclaimed on the show. But a closer look reveals a more nuanced picture. Mixed results on other benchmarks indicate that Sonnet 3.5 might not be the universal champion some anticipated. My friend who has held back internal benchmarks was disappointed highlighting weaknesses in scientific reasoning and certain writing tasks. Some folks are seeing it being lazy-er for some full code completion, while the context window is now doubled from 4K to 8K! This goes to show again, that benchmarks don't tell the full story, so we wait for LMArena (formerly LMSys Arena) and the vibe checks from across the community. However it absolutely dominates in code tasks, that much is clear already. This is a screenshot of the new model on Aider code editing benchmark, a fairly reliable way to judge models code output, they also have a code refactoring benchmarkHaiku 3.5 and the Vanishing Opus: Anthropic's Cryptic CluesFurther adding to the intrigue, Anthropic announced Claude 3.5 Haiku! They usually provide immediate access, but Haiku remains elusive, saying that it's available by end of the month, which is very very soon. Making things even more curious, their highly anticipated Opus model has seemingly vanished from their website. "They've gone completely silent on 3.5 Opus," Simon Willison (𝕏) noted, mentioning conspiracy theories that this new Sonnet might simply be a rebranded Opus? 🕯️ 🕯️ We'll make a summoning circle for new Opus and update you once it lands (maybe next year) Claude Takes Control (Sort Of): Computer Use API and the Dawn of AI Agents (𝕏)The biggest bombshell this week? Anthropic's Computer Use. This isn't just about executing code; it’s about Claude interacting with computers, clicking buttons, browsing the web, and yes, even ordering pizza! Killian Lukas (𝕏), creator of Open Interpreter, returned to ThursdAI to discuss this groundbreaking development. "This stuff of computer use…it’s the same argument for having humanoid robots, the web is human shaped, and we need AIs to interact with computers and the web the way humans do" Killian explained, illuminating the potential for bridging the digital and physical worlds. Simon, though enthusiastic, provided a dose of realism: "It's incredibly impressive…but also very much a V1, beta.” Having tackled the setup myself, I agree; the current reliance on a local Docker container and virtual machine introduces some complexity and security considerations. However, seeing Claude fix its own Docker installation error was an unforgettably mindblowing experience. The future of AI agents is upon us, even if it’s still a bit rough around the edges.Here's an easy guide to set it up yourself, takes 5 minutes, requires no coding skills and it's safely tucked away in a container.Big Tech's AI Moves: Apple Embraces ChatGPT, X.ai API (+Vision!?), and Cohere Multimodal EmbeddingsThe rest of the AI world wasn’t standing still. Apple made a surprising integration, while X.ai and Cohere pushed their platforms forward.Apple iOS 18.2 Beta: Siri Phones a Friend (ChatGPT)Apple, always cautious, surprisingly integrated ChatGPT directly into iOS. While Siri remains…well, Siri, users can now effortlessly offload more demanding tasks to ChatGPT. "Siri ...
    Show more Show less
    1 hr and 56 mins
  • 📆 ThursdAI - Oct 17 - Robots, Rockets, and Multi Modal Mania with open source voice cloning, OpenAI new voice API and more AI news
    Oct 18 2024
    Hey folks, Alex here from Weights & Biases, and this week has been absolutely bonkers. From robots walking among us to rockets landing on chopsticks (well, almost), the future is feeling palpably closer. And if real-world robots and reusable spaceship boosters weren't enough, the open-source AI community has been cooking, dropping new models and techniques faster than a Starship launch. So buckle up, grab your space helmet and noise-canceling headphones (we’ll get to why those are important!), and let's blast off into this week’s AI adventures!TL;DR and show-notes + links at the end of the post 👇Robots and Rockets: A Glimpse into the FutureI gotta start with the real-world stuff because, let's be honest, it's mind-blowing. We had Robert Scoble (yes, the Robert Scoble) join us after attending the Tesla We, Robot AI event, reporting on Optimus robots strolling through crowds, serving drinks, and generally being ridiculously futuristic. Autonomous robo-taxis were also cruising around, giving us a taste of a driverless future.Robert’s enthusiasm was infectious: "It was a vision of the future, and from that standpoint, it succeeded wonderfully." I couldn't agree more. While the market might have had a mini-meltdown (apparently investors aren't ready for robot butlers yet), the sheer audacity of Tesla’s vision is exhilarating. These robots aren't just cool gadgets; they represent a fundamental shift in how we interact with technology and the world around us. And they’re learning fast. Just days after the event, Tesla released a video of Optimus operating autonomously, showcasing the rapid progress they’re making.And speaking of audacious visions, SpaceX decided to one-up everyone (including themselves) by launching Starship and catching the booster with Mechazilla – their giant robotic chopsticks (okay, technically a launch tower, but you get the picture). Waking up early with my daughter to watch this live was pure magic. As Ryan Carson put it, "It was magical watching this… my kid who's 16… all of his friends are getting their imaginations lit by this experience." That’s exactly what we need - more imagination and less doomerism! The future is coming whether we like it or not, and I, for one, am excited.Open Source LLMs and Tools: The Community Delivers (Again!)Okay, back to the virtual world (for now). This week's open-source scene was electric, with new model releases and tools that have everyone buzzing (and benchmarking like crazy!).* Nemotron 70B: Hype vs. Reality: NVIDIA dropped their Nemotron 70B instruct model, claiming impressive scores on certain benchmarks (Arena Hard, AlpacaEval), even suggesting it outperforms GPT-4 and Claude 3.5. As always, we take these claims with a grain of salt (remember Reflection?), and our resident expert, Nisten, was quick to run his own tests. The verdict? Nemotron is good, "a pretty good model to use," but maybe not the giant-killer some hyped it up to be. Still, kudos to NVIDIA for pushing the open-source boundaries. (Hugging Face, Harrison Kingsley evals)* Zamba 2 : Hybrid Vigor: Zyphra, in collaboration with NVIDIA, released Zamba 2, a hybrid Sparse Mixture of Experts (SME) model. We had Paolo Glorioso, a researcher from Ziphra, join us to break down this unique architecture, which combines the strengths of transformers and state space models (SSMs). He highlighted the memory and latency advantages of SSMs, especially for on-device applications. Definitely worth checking out if you’re interested in transformer alternatives and efficient inference.* Zyda 2: Data is King (and Queen): Alongside Zamba 2, Zyphra also dropped Zyda 2, a massive 5 trillion token dataset, filtered, deduplicated, and ready for LLM training. This kind of open-source data release is a huge boon to the community, fueling the next generation of models. (X)* Ministral: Pocket-Sized Power: On the one-year anniversary of the iconic Mistral 7B release, Mistral announced two new smaller models – Ministral 3B and 8B. Designed for on-device inference, these models are impressive, but as always, Qwen looms large. While Mistral didn’t include Qwen in their comparisons, early tests suggest Qwen’s smaller models still hold their own. One point of contention: these Ministrals aren't as open-source as the original 7B, which is a bit of a bummer, with the 3B not being even released anywhere besides their platform. (Mistral Blog)* Entropix (aka Shrek Sampler): Thinking Outside the (Sample) Box: This one is intriguing! Entropix introduces a novel sampling technique aimed at boosting the reasoning capabilities of smaller LLMs. Nisten’s yogurt analogy explains it best: it’s about “marinating” the information and picking the best “flavor” (token) at the end. Early examples look promising, suggesting Entropix could help smaller models tackle problems that even trip up their larger counterparts. But, as with all shiny new AI toys, we're eagerly awaiting robust evals. Tim...
    Show more Show less
    1 hr and 35 mins
  • 📆 ThursdAI - Oct 10 - Two Nobel Prizes in AI!? Meta Movie Gen (and sounds ) amazing, Pyramid Flow a 2B video model, 2 new VLMs & more AI news!
    Oct 10 2024
    Hey Folks, we are finally due for a "relaxing" week in AI, no more HUGE company announcements (if you don't consider Meta Movie Gen huge), no conferences or dev days, and some time for Open Source projects to shine. (while we all wait for Opus 3.5 to shake things up) This week was very multimodal on the show, we covered 2 new video models, one that's tiny and is open source, and one massive from Meta that is aiming for SORA's crown, and 2 new VLMs, one from our friends at REKA that understands videos and audio, while the other from Rhymes is apache 2 licensed and we had a chat with Kwindla Kramer about OpenAI RealTime API and it's shortcomings and voice AI's in general. ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.All right, let's TL;DR and show notes, and we'll start with the 2 Nobel prizes in AI 👇 * 2 AI nobel prizes* John Hopfield and Geoffrey Hinton have been awarded a Physics Nobel prize* Demis Hassabis, John Jumper & David Baker, have been awarded this year's #NobelPrize in Chemistry.* Open Source LLMs & VLMs* TxT360: a globally deduplicated dataset for LLM pre-training ( Blog, Dataset)* Rhymes Aria - 25.3B multimodal MoE model that can take image/video inputs Apache 2 (Blog, HF, Try It)* Maitrix and LLM360 launch a new decentralized arena (Leaderboard, Blog)* New Gradio 5 with server side rendering (X)* LLamaFile now comes with a chat interface and syntax highlighting (X)* Big CO LLMs + APIs* OpenAI releases MLEBench - new kaggle focused benchmarks for AI Agents (Paper, Github)* Inflection is still alive - going for enterprise lol (Blog)* new Reka Flash 21B - (X, Blog, Try It)* This weeks Buzz* We chatted about Cursor, it went viral, there are many tips* WandB releases HEMM - benchmarks of text-to-image generation models (X, Github, Leaderboard)* Vision & Video* Meta presents Movie Gen 30B - img and text to video models (blog, paper)* Pyramid Flow - open source img2video model MIT license (X, Blog, HF, Paper, Github)* Voice & Audio* Working with OpenAI RealTime Audio - Alex conversation with Kwindla from trydaily.com* Cartesia Sonic goes multilingual (X)* Voice hackathon in SF with 20K prizes (and a remote track) - sign up* Tools* LM Studio ships with MLX natively (X, Download)* UITHUB.com - turn any github repo into 1 long file for LLMsA Historic Week: TWO AI Nobel Prizes!This week wasn't just big; it was HISTORIC. As Yam put it, "two Nobel prizes for AI in a single week. It's historic." And he's absolutely spot on! Geoffrey Hinton, often called the "grandfather of modern AI," alongside John Hopfield, were awarded the Nobel Prize in Physics for their foundational work on neural networks - work that paved the way for everything we're seeing today. Think back propagation, Boltzmann machines – these are concepts that underpin much of modern deep learning. It’s about time they got the recognition they deserve!Yoshua Bengio posted about this in a very nice quote: @HopfieldJohn and @geoffreyhinton, along with collaborators, have created a beautiful and insightful bridge between physics and AI. They invented neural networks that were not only inspired by the brain, but also by central notions in physics such as energy, temperature, system dynamics, energy barriers, the role of randomness and noise, connecting the local properties, e.g., of atoms or neurons, to global ones like entropy and attractors. And they went beyond the physics to show how these ideas could give rise to memory, learning and generative models; concepts which are still at the forefront of modern AI researchAnd Hinton's post-Nobel quote? Pure gold: “I’m particularly proud of the fact that one of my students fired Sam Altman." He went on to explain his concerns about OpenAI's apparent shift in focus from safety to profits. Spicy take! It sparked quite a conversation about the ethical implications of AI development and who’s responsible for ensuring its safe deployment. It’s a discussion we need to be having more and more as the technology evolves. Can you guess which one of his students it was? Then, not to be outdone, the AlphaFold team (Demis Hassabis, John Jumper, and David Baker) snagged the Nobel Prize in Chemistry for AlphaFold 2. This AI revolutionized protein folding, accelerating drug discovery and biomedical research in a way no one thought possible. These awards highlight the tangible, real-world applications of AI. It's not just theoretical anymore; it's transforming industries.Congratulations to all winners, and we gotta wonder, is this a start of a trend of AI that takes over every Nobel prize going forward? 🤔 Open Source LLMs & VLMs: The Community is COOKING!The open-source AI community consistently punches above its weight, and this week was no exception. We saw some truly impressive releases that deserve a standing ovation. First off, the TxT360 dataset (blog, ...
    Show more Show less
    1 hr and 30 mins
  • 📆 ThursdAI - Oct 3 - OpenAI RealTime API, ChatGPT Canvas & other DevDay news (how I met Sam Altman), Gemini 1.5 8B is basically free, BFL makes FLUX 1.1 6x faster, Rev breaks whisper records...
    Oct 4 2024
    Hey, it's Alex. Ok, so mind is officially blown. I was sure this week was going to be wild, but I didn't expect everyone else besides OpenAI to pile on, exactly on ThursdAI. Coming back from Dev Day (number 2) and am still processing, and wanted to actually do a recap by humans, not just the NotebookLM one I posted during the keynote itself (which was awesome and scary in a "will AI replace me as a podcaster" kind of way), and was incredible to have Simon Willison who was sitting just behind me most of Dev Day, join me for the recap! But then the news kept coming, OpenAI released Canvas, which is a whole new way of interacting with chatGPT, BFL released a new Flux version that's 8x faster, Rev released a Whisper killer ASR that does diarizaiton and Google released Gemini 1.5 Flash 8B, and said that with prompt caching (which OpenAI now also has, yay) this will cost a whopping 0.01 / Mtok. That's 1 cent per million tokens, for a multimodal model with 1 million context window. 🤯 This whole week was crazy, as last ThursdAI after finishing the newsletter I went to meet tons of folks at the AI Tinkerers in Seattle, and did a little EvalForge demo (which you can see here) and wanted to share EvalForge with you as well, it's early but very promising so feedback and PRs are welcome! WHAT A WEEK, TL;DR for those who want the links and let's dive in 👇 * OpenAI - Dev Day Recap (Alex, Simon Willison)* Recap of Dev Day* RealTime API launched* Prompt Caching launched* Model Distillation is the new finetune* Finetuning 4o with images (Skalski guide)* Fireside chat Q&A with Sam* Open Source LLMs * NVIDIA finally releases NVML (HF)* This weeks Buzz* Alex discussed his demo of EvalForge at the AI Tinkers event in Seattle in "This Week's Buzz". (Demo, EvalForge, AI TInkerers)* Big Companies & APIs* Google has released Gemini Flash 8B - 0.01 per million tokens cached (X, Blog)* Voice & Audio* Rev breaks SOTA on ASR with Rev ASR and Rev Diarize (Blog, Github, HF)* AI Art & Diffusion & 3D* BFL relases Flux1.1[pro] - 3x-6x faster than 1.0 and higher quality (was 🫐) - (Blog, Try it)The day I met Sam Altman / Dev Day recapLast Dev Day (my coverage here) was a "singular" day in AI for me, given it also had the "keep AI open source" with Nous Research and Grimes, and this Dev Day I was delighted to find out that the vibe was completely different, and focused less on bombastic announcements or models, but on practical dev focused things. This meant that OpenAI cherry picked folks who actively develop with their tools, and they didn't invite traditional media, only folks like yours truly, @swyx from Latent space, Rowan from Rundown, Simon Willison and Dan Shipper, you know, newsletter and podcast folks who actually build! This also allowed for many many OpenAI employees who work on the products and APIs we get to use, were there to receive feedback, help folks with prompting, and just generally interact with the devs, and build that community. I want to shoutout my friends Ilan (who was in the keynote as the strawberry salesman interacting with RealTime API agent), Will DePue from the SORA team, with whom we had an incredible conversation about ethics and legality of projects, Christine McLeavey who runs the Audio team, with whom I shared a video of my daughter crying when chatGPT didn't understand her, Katia, Kevin and Romain on the incredible DevEx/DevRel team and finally, my new buddy Jason who does infra, and was fighting bugs all day and only joined the pub after shipping RealTime to all of us. I've collected all these folks in a convenient and super high signal X list here so definitely give that list a follow if you'd like to tap into their streamsFor the actual announcements, I've already covered this in my Dev Day post here (which was payed subscribers only, but is now open to all) and Simon did an incredible summary on his Substack as well The highlights were definitely the new RealTime API that let's developers build with Advanced Voice Mode, Prompt Caching that will happen automatically and reduce all your long context API calls by a whopping 50% and finetuning of models that they are rebranding into Distillation and adding new tools to make it easier (including Vision Finetuning for the first time!)Meeting Sam AltmanWhile I didn't get a "media" pass or anything like this, and didn't really get to sit down with OpenAI execs (see Swyx on Latent Space for those conversations), I did have a chance to ask Sam multiple things. First at the closing fireside chat between Sam and Kevin Weil (CPO at OpenAI), Kevin first asked Sam a bunch of questions, and then they gave out the microphones to folks, and I asked the only question that got Sam to smileSam and Kevin went on for a while, and that Q&A was actually very interesting, so much so, that I had to recruit my favorite Notebook LM podcast hosts, to go through it and give you an overview, so here's that Notebook LM, with the transcript of the whole Q&A (maybe i'll ...
    Show more Show less
    1 hr and 45 mins