• 🧨 ThursdAI - July 25 - OpenSource GPT4 intelligence has arrived - Meta LLaMa 3.1 405B beats GPT4o! Mistral Large 2 also, Deepseek Code v2 ALSO - THIS WEEK

  • Jul 25 2024
  • Length: 1 hr and 38 mins
  • Podcast

🧨 ThursdAI - July 25 - OpenSource GPT4 intelligence has arrived - Meta LLaMa 3.1 405B beats GPT4o! Mistral Large 2 also, Deepseek Code v2 ALSO - THIS WEEK

  • Summary

  • Holy s**t, folks! I was off for two weeks, last week OpenAI released GPT-4o-mini and everyone was in my mentions saying, Alex, how are you missing this?? and I'm so glad I missed that last week and not this one, because while GPT-4o-mini is incredible (GPT-4o level distill with incredible speed and almost 99% cost reduction from 2 years ago?) it's not open source. So welcome back to ThursdAI, and buckle up because we're diving into what might just be the craziest week in open-source AI since... well, ever!This week, we saw Meta drop LLAMA 3.1 405B like it's hot (including updated 70B and 8B), Mistral joining the party with their Large V2, and DeepSeek quietly updating their coder V2 to blow our minds. Oh, and did I mention Google DeepMind casually solving math Olympiad problems at silver level medal 🥈? Yeah, it's been that kind of week.TL;DR of all topics covered: * Open Source* Meta LLama 3.1 updated models (405B, 70B, 8B) - Happy LLama Day! (X, Announcement, Zuck, Try It, Try it Faster, Evals, Provider evals)* Mistral Large V2 123B (X, HF, Blog, Try It)* DeepSeek-Coder-V2-0724 update (API only)* Big CO LLMs + APIs* 🥈 Google Deepmind wins silver medal at Math Olympiad - AlphaGeometry 2 (X)* OpenAI teases SearchGPT - their reimagined search experience (Blog)* OpenAI opens GPT-4o-mini finetunes + 2 month free (X)* This weeks Buzz* I compare 5 LLama API providers for speed and quantization using Weave (X)* Voice & Audio* Daily announces a new open standard for real time Voice and Video RTVI-AI (X, Try it, Github)Meta LLAMA 3.1: The 405B Open Weights Frontier Model Beating GPT-4 👑Let's start with the star of the show: Meta's LLAMA 3.1. This isn't just a 0.1 update; it's a whole new beast. We're talking about a 405 billion parameter model that's not just knocking on GPT-4's door – it's kicking it down.Here's the kicker: you can actually download this internet scale intelligence (if you have 820GB free). That's right, a state-of-the-art model beating GPT-4 on multiple benchmarks, and you can click a download button. As I said during the show, "This is not only refreshing, it's quite incredible."Some highlights:* 128K context window (finally!)* MMLU score of 88.6* Beats GPT-4 on several benchmarks like IFEval (88.6%), GSM8K (96.8%), and ARC Challenge (96.9%)* Has Tool Use capabilities (also beating GPT-4) and is Multilingual (ALSO BEATING GPT-4)But that's just scratching the surface. Let's dive deeper into what makes LLAMA 3.1 so special.The Power of Open WeightsMark Zuckerberg himself dropped an exclusive interview with our friend Rowan Cheng from Rundown AI. And let me tell you, Zuck's commitment to open-source AI is no joke. He talked about distillation, technical details, and even released a manifesto on why open AI (the concept, not the company) is "the way forward".As I mentioned during the show, "The fact that this dude, like my age, I think he's younger than me... knows what they released to this level of technical detail, while running a multi billion dollar company is just incredible to me."Evaluation ExtravaganzaThe evaluation results for LLAMA 3.1 are mind-blowing. We're not just talking about standard benchmarks here. The model is crushing it on multiple fronts:* MMLU (Massive Multitask Language Understanding): 88.6%* IFEval (Instruction Following): 88.6%* GSM8K (Grade School Math): 96.8%* ARC Challenge: 96.9%But it doesn't stop there. The fine folks at meta also for the first time added new categories like Tool Use (BFCL 88.5) and Multilinguality (Multilingual MGSM 91.6) (not to be confused with MultiModality which is not yet here, but soon) Now, these are official evaluations from Meta themselves, that we know, often don't really represent the quality of the model, so let's take a look at other, more vibey results shall we? On SEAL leaderboards from Scale (held back so can't be trained on) LLama 405B is beating ALL other models on Instruction Following, getting 4th at Coding and 2nd at Math tasks. On MixEval (the eval that approximates LMsys with 96% accuracy), my colleagues Ayush and Morgan got a whopping 66%, placing 405B just after Clause Sonnet 3.5 and above GPT-4oAnd there are more evals that all tell the same story, we have a winner here folks (see the rest of the evals in my thread roundup)The License Game-ChangerMeta didn't just release a powerful model; they also updated their license to allow for synthetic data creation and distillation. This is huge for the open-source community.LDJ highlighted its importance: "I think this is actually pretty important because even though, like you said, a lot of people still train on OpenAI outputs anyways, there's a lot of legal departments and a lot of small, medium, and large companies that they restrict the people building and fine-tuning AI models within that company from actually being able to build the best models that they can because of these restrictions."This update could lead to a boom in custom models and applications across...
    Show More Show Less
activate_samplebutton_t1

What listeners say about 🧨 ThursdAI - July 25 - OpenSource GPT4 intelligence has arrived - Meta LLaMa 3.1 405B beats GPT4o! Mistral Large 2 also, Deepseek Code v2 ALSO - THIS WEEK

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.