ThursdAI - The top AI news from the past week

By: From Weights & Biases Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
  • Summary

  • Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

    sub.thursdai.news
    Alex Volkov
    Show More Show Less
Episodes
  • 📆 ThursdAI - Nov 21 - The fight for the LLM throne, OSS SOTA from AllenAI, Flux new tools, Deepseek R1 reasoning & more AI news
    Nov 22 2024
    Hey folks, Alex here, and oof what a 🔥🔥🔥 show we had today! I got to use my new breaking news button 3 times this show! And not only that, some of you may know that one of the absolutely biggest pleasures as a host, is to feature the folks who actually make the news on the show!And now that we're in video format, you actually get to see who they are! So this week I was honored to welcome back our friend and co-host Junyang Lin, a Dev Lead from the Alibaba Qwen team, who came back after launching the incredible Qwen Coder 2.5, and Qwen 2.5 Turbo with 1M context.We also had breaking news on the show that AI2 (Allen Institute for AI) has fully released SOTA LLama post-trained models, and I was very lucky to get the core contributor on the paper, Nathan Lambert to join us live and tell us all about this amazing open source effort! You don't want to miss this conversation!Lastly, we chatted with the CEO of StackBlitz, Eric Simons, about the absolutely incredible lightning in the bottle success of their latest bolt.new product, how it opens a new category of code generator related tools.00:00 Introduction and Welcome00:58 Meet the Hosts and Guests02:28 TLDR Overview03:21 Tl;DR04:10 Big Companies and APIs07:47 Agent News and Announcements08:05 Voice and Audio Updates08:48 AR, Art, and Diffusion11:02 Deep Dive into Mistral and Pixtral29:28 Interview with Nathan Lambert from AI230:23 Live Reaction to Tulu 3 Release30:50 Deep Dive into Tulu 3 Features32:45 Open Source Commitment and Community Impact33:13 Exploring the Released Artifacts33:55 Detailed Breakdown of Datasets and Models37:03 Motivation Behind Open Source38:02 Q&A Session with the Community38:52 Summarizing Key Insights and Future Directions40:15 Discussion on Long Context Understanding41:52 Closing Remarks and Acknowledgements44:38 Transition to Big Companies and APIs45:03 Weights & Biases: This Week's Buzz01:02:50 Mistral's New Features and Upgrades01:07:00 Introduction to DeepSeek and the Whale Giant01:07:44 DeepSeek's Technological Achievements01:08:02 Open Source Models and API Announcement01:09:32 DeepSeek's Reasoning Capabilities01:12:07 Scaling Laws and Future Predictions01:14:13 Interview with Eric from Bolt01:14:41 Breaking News: Gemini Experimental01:17:26 Interview with Eric Simons - CEO @ Stackblitz01:19:39 Live Demo of Bolt's Capabilities01:36:17 Black Forest Labs AI Art Tools01:40:45 Conclusion and Final ThoughtsAs always, the show notes and TL;DR with all the links I mentioned on the show and the full news roundup below the main new recap 👇Google & OpenAI fighting for the LMArena crown 👑I wanted to open with this, as last week I reported that Gemini Exp 1114 has taken over #1 in the LMArena, in less than a week, we saw a new ChatGPT release, called GPT-4o-2024-11-20 reclaim the arena #1 spot!Focusing specifically on creating writing, this new model, that's now deployed on chat.com and in the API, is definitely more creative according to many folks who've tried it, with OpenAI employees saying "expect qualitative improvements with more natural and engaging writing, thoroughness and readability" and indeed that's what my feed was reporting as well.I also wanted to mention here, that we've seen this happen once before, last time Gemini peaked at the LMArena, it took less than a week for OpenAI to release and test a model that beat it.But not this time, this time Google came prepared with an answer!Just as we were wrapping up the show (again, Logan apparently loves dropping things at the end of ThursdAI), we got breaking news that there is YET another experimental model from Google, called Gemini Exp 1121, and apparently, it reclaims the stolen #1 position, that chatGPT reclaimed from Gemini... yesterday! Or at least joins it at #1LMArena Fatigue?Many folks in my DMs are getting a bit frustrated with these marketing tactics, not only the fact that we're getting experimental models faster than we can test them, but also with the fact that if you think about it, this was probably a calculated move by Google. Release a very powerful checkpoint, knowing that this will trigger a response from OpenAI, but don't release your most powerful one. OpenAI predictably releases their own "ready to go" checkpoint to show they are ahead, then folks at Google wait and release what they wanted to release in the first place.The other frustration point is, the over-indexing of the major labs on the LMArena human metrics, as the closest approximation for "best". For example, here's some analysis from Artificial Analysis showing that the while the latest ChatGPT is indeed better at creative writing (and #1 in the Arena, where humans vote answers against each other), it's gotten actively worse at MATH and coding from the August version (which could be a result of being a distilled much smaller version) .In summary, maybe the LMArena is no longer 1 arena is all you need, but the competition at the TOP scores of the Arena has never been ...
    Show More Show Less
    1 hr and 45 mins
  • 📆 ThursdAI - Nov 14 - Qwen 2.5 Coder, No Walls, Gemini 1114 👑 LLM, ChatGPT OS integrations & more AI news
    Nov 15 2024
    This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real time at 59:32)00:00 Welcome to ThursdAI00:25 Meet the Hosts02:38 Show Format and Community03:18 TLDR Overview04:01 Open Source Highlights13:31 Qwen Coder 2.5 Release14:00 Speculative Decoding and Model Performance22:18 Interactive Demos and Artifacts28:20 Training Insights and Future Prospects33:54 Breaking News: Nexus Flow36:23 Exploring Athene v2 Agent Capabilities36:48 Understanding ArenaHard and Benchmarking40:55 Scaling and Limitations in AI Models43:04 Nexus Flow and Scaling Debate49:00 Open Source LLMs and New Releases52:29 FrontierMath Benchmark and Quantization Challenges58:50 Gemini Experimental 1114 Release and Performance01:11:28 LLM Observability with Weave01:14:55 Introduction to Tracing and Evaluations01:15:50 Weave API Toolkit Overview01:16:08 Buzz Corner: Weights & Biases01:16:18 Nous Forge Reasoning API01:26:39 Breaking News: OpenAI's New MacOS Features01:27:41 Live Demo: ChatGPT Integration with VS Code01:34:28 Ultravox: Real-Time AI Conversations01:42:03 Tilde Research and Stargazer Tool01:46:12 Conclusion and Final ThoughtsThis week also, there was a debate online, whether deep learning (and scale is all you need) has hit a wall, with folks like Ilya Sutskever being cited by publications claiming it has, folks like Yann LeCoon calling "I told you so". TL;DR? multiple huge breakthroughs later, and both Oriol from DeepMind and Sam Altman are saying "what wall?" and Heiner from X.ai saying "skill issue", there is no walls in sight, despite some tech journalism love to pretend there is. Also, what happened to Yann? 😵‍💫Ok, back to our scheduled programming, here's the TL;DR, afterwhich, a breakdown of the most important things about today's update, and as always, I encourage you to watch / listen to the show, as we cover way more than I summarize here 🙂TL;DR and Show Notes:* Open Source LLMs* Qwen Coder 2.5 32B (+5 others) - Sonnet @ home (HF, Blog, Tech Report)* The End of Quantization? (X, Original Thread)* Epoch : FrontierMath new benchmark for advanced MATH reasoning in AI (Blog)* Common Corpus: Largest multilingual 2T token dataset (blog)* NexusFlow - Athena v2 - open model suite (X, Blog, HF)* Big CO LLMs + APIs* Gemini 1114 is new king LLM #1 LMArena (X)* Nous Forge Reasoning API - beta (Blog, X)* Reuters reports "AI is hitting a wall" and it's becoming a meme (Article)* Cursor acq. SuperMaven (X)* This Weeks Buzz* Weave JS/TS support is here 🙌* Voice & Audio* Fixie releases UltraVox SOTA (Demo, HF, API)* Suno v4 is coming and it's bonkers amazing (Alex Song, SOTA Jingle)* Tools demoed* Qwen artifacts - HF Demo* Tilde Galaxy - Interp Tool This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
    Show More Show Less
    1 hr and 49 mins
  • 📆 ThursdAI - Nov 7 - Video version, full o1 was given and taken away, Anthropic price hike-u, halloween 💀 recap & more AI news
    Nov 8 2024
    👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr).I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams submitted incredible projects 👏 You can follow some of these hereI then decided to stick around and record the show from SF, and finally pulled the plug and asked for some budget, and I present, the first ThursdAI, recorded from the newly minted W&B Podcast studio at our office in SF 🎉This isn't the only first, today also, for the first time, all of the regular co-hosts of ThursdAI, met on video for the first time, after over a year of hanging out weekly, we've finally made the switch to video, and you know what? Given how good AI podcasts are getting, we may have to stick around with this video thing! We played one such clip from a new model called hertz-dev, which is a <10B model for full duplex audio.Given that today's episode is a video podcast, I would love for you to see it, so here's the timestamps for the chapters, which will be followed by the TL;DR and show notes in raw format. I would love to hear from folks who read the longer form style newsletters, do you miss them? Should I bring them back? Please leave me a comment 🙏 (I may send you a survey)This was a generally slow week (for AI!! not for... ehrm other stuff) and it was a fun podcast! Leave me a comment about what you think about this new format.Chapter Timestamps00:00 Introduction and Agenda Overview00:15 Open Source LLMs: Small Models01:25 Open Source LLMs: Large Models02:22 Big Companies and LLM Announcements04:47 Hackathon Recap and Community Highlights18:46 Technical Deep Dive: HertzDev and FishSpeech33:11 Human in the Loop: AI Agents36:24 Augmented Reality Lab Assistant36:53 Hackathon Highlights and Community Vibes37:17 Chef Puppet and Meta Ray Bans Raffle37:46 Introducing Fester the Skeleton38:37 Fester's Performance and Community Reactions39:35 Technical Insights and Project Details42:42 Big Companies API Updates43:17 Haiku 3.5: Performance and Pricing43:44 Comparing Haiku and Sonnet Models51:32 XAI Grok: New Features and Pricing57:23 OpenAI's O1 Model: Leaks and Expectations01:08:42 Transformer ASIC: The Future of AI Hardware01:13:18 The Future of Training and Inference Chips01:13:52 Oasis Demo and Etched AI Controversy01:14:37 Nisten's Skepticism on Etched AI01:19:15 Human Layer Introduction with Dex01:19:24 Building and Managing AI Agents01:20:54 Challenges and Innovations in AI Agent Development01:21:28 Human Layer's Vision and Future01:36:34 Recap and Closing RemarksThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Show Notes and Links:* Interview* Dexter Horthy (X) from HumanLayer* Open Source LLMs* SmolLM2: the new, best, and open 1B-parameter language mode (X)* Meta released MobileLLM (125M, 350M, 600M, 1B) (HF)* Tencent Hunyuan Large - 389B X 52B (Active) MoE (X, HF, Paper)* Big CO LLMs + APIs* OpenAI buys and opens chat.com* Anthropic releases Claude Haiku 3.5 via API (X, Blog)* OpenAI drops o1 full - and pulls it back (but not before it got Jailbroken)* X.ai now offers $25/mo free of Grok API credits (X, Platform)* Etched announces Sonu - first Transformer ASIC - 500K tok/s (etched)* PPXL is not valued at 9B lol* This weeks Buzz* Recap of SF Hackathon w/ AI Tinkerers (X)* Fester the Halloween Toy aka Project Halloweave videos from trick or treating (X, Writeup)* Voice & Audio* Hertz-dev - 8.5B conversation audio gen (X, Blog )* Fish Agent v0.1 3B - Speech to Speech model (HF, Demo)* AI Art & Diffusion & 3D* FLUX 1.1 [pro] is how HD - 4x resolution (X, blog)Full Transcription for convenience below: This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
    Show More Show Less
    1 hr and 38 mins

What listeners say about ThursdAI - The top AI news from the past week

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.