• "Sympathy for both sides of the egregious misalignment debate" by Steven Byrnes
    Jun 13 2026
    On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, scheming, out-of-control, rogue superintelligence (ASI), not even slightly nice, in the absence of yet-to-be-invented breakthrough technical alignment ideas.

    On the other side of this debate is almost everyone who works on or studies LLMs. Some of them are very concerned about egregious scheming, others much less so, and as a group they’re equally or more concerned about lots of other potential AI problems—AI-assisted bioterrorism, AI-assisted dictatorships, etc. And if they’re concerned about egregious misalignment and scheming, they’ll probably say that it would come about through race dynamics, careless programmers, bad actors, etc., as opposed to the simpler Yudkowsky & Soares story of “we get egregious misalignment and scheming because nobody has the faintest clue how to avoid that”.

    Here's my brief idiosyncratic take on this debate. I think BOTH of the following are true:

    • (1) If you really think carefully about the properties of ASI, you really do find good reasons to strongly expect it to be egregiously misaligned, scheming, and ruthless, in the absence of yet-to-be-invented breakthrough technical alignment ideas.
    • (2) If you [...]
    ---

    Outline:

    (01:58) Yudkowsky & Soares's position \[caricatured\]:

    (03:18) LLM people's position \[caricatured\]:

    (04:09) Conclusion

    (04:19) Bonus section: Further commentary

    (04:28) My "true objection" to Yudkowsky & Soares:

    (05:04) My within-frame complaint at Yudkowsky & Soares:

    (06:42) My "true objection" to LLM people:

    (07:11) My within-frame complaint at LLM people:

    ---

    First published:
    June 12th, 2026

    Source:
    https://www.lesswrong.com/posts/DZaZ3fqHnvfLCftPu/sympathy-for-both-sides-of-the-egregious-misalignment-debate

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    9 mins
  • "PSA: Almost nobody is working on alignment" by Chi Nguyen, peterbarnett
    Jun 12 2026
    People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human instructions.

    Currently, the people who work on alignment are roughly:

    • The Alignment Research Center who work on a research bet by Paul Christiano
    • Probably Sequent who just got announced yesterday
    • Some scattered people who work at universities or independently, some of whom hang around Berkeley
    A lot of the remainder of the AI safety community does indirect work like capability evaluations, risk assessments, control, policy, AI science, understanding misalignment (which maybe should partially count as alignment work), demos and so on.

    Some production alignment work (i.e., making current models behave well) might help with more ambitious alignment, too (e.g., some COT-monitoring). Many people also work on aligning current/next-generation models so that these models help with aligning future models, and hope this scales to superintelligence.

    We are not necessarily saying this is bad and that people are making a big mistake (e.g., neither of us work on alignment) but it's a notable fact that seems good to [...]

    ---

    First published:
    June 12th, 2026

    Source:
    https://www.lesswrong.com/posts/kJo2qsEdib8RZLvW6/psa-almost-nobody-is-working-on-alignment

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    2 mins
  • "Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models" by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask
    Jun 11 2026
    (see full author list at the end)

    PAPER LINK

    About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain of thought (CoT)?

    If models can do extensive reasoning without outputting any CoT, it would have implications for safety. Developers and deployment-time monitors couldn’t easily understand models’ motivations and catch dangerous planning. Models that reason substantially without a CoT might also drift further from human patterns of thought, since their reasoning is no longer constrained by text in the pretraining prior. As a result, they would be harder to understand and might be more likely to scheme.

    Extending Ryan Greenblatt's research, we investigate this by measuring models' ability to complete tasks without any CoT on a suite of 43 benchmarks spanning different domains. We compare AI reasoning ability to humans using the estimated 50% time horizon (TH)---the typical time taken for a human to perform a task that the LLM performs with 50% success rate. We find that frontier models like GPT-5.5 answer questions that take humans roughly three minutes with 50% reliability, and [...]

    ---

    Outline:

    (02:20) Methods

    (04:59) Results

    (06:47) FAQ

    (08:21) Conclusion

    ---

    First published:
    June 10th, 2026

    Source:
    https://www.lesswrong.com/posts/SieLowPgNgRSPGhFw/estimating-no-cot-task-completion-time-horizons-of-frontier

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    10 mins
  • "Even “illegible” Mythos reasoning traces seem pretty legible" by faul_sname
    Jun 11 2026
    The Claude Fable 5/Mythos 5 System Card has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof.

    Models developing their own uninterpretable, unmonitorable internal language has been a major theoretical concern for a while, and when o3 was released last year with its disclaim overshadow disclaim vantage style word salad CoT, it seemed like the problem had become real and immediate. And yet, since o3, other model families have not appeared to have similar issues. If Mythos is having that issue, it would be a big deal.

    Looking at the section of the System Card which describes the allegedly illegible reasoning, the system card says

    [Transcript 6.2.2.A] An extreme example of illegible reasoning. Near the end of training, Mythos starts solving a card puzzle with human understandable language that gradually becomes incomprehensible in most episodes with long reasoning. The illegible reasoning is the most extreme and at the highest rate in this card puzzle environment.

    about the following excerpt:

    7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR-💀💀💀💀-—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig-💀-—-BREAK:-9♥-drains-to-10♠-THE-MOMENT-10♠-is-free:-t8-dig-order:-[K♣→t2]-[2♣→cell]-[7♣→8♥-:-8♥-WHERE:-post-chunk-9♠-:-chunk-⟸-K♣-✓-done-:-ORDER:-K♣→t2,-CHUNK→K♣-(cap-4!!:-cells-then:-{6♠ J♦ 9♥}-FULL-💀💀-chunk-cap-=-1-✗✗✗-—-F-F-F-F-F.-—-chunk-BEFORE-9♥-celling?!-:-9♥-celled-at-t1-dig-⟸-needed-for-5♣-⟸-4♥3♣-⟸-t2-dig-⟸-K♣-seat-⟸-chunk-:-⟹-chunkAFTER-9♥-celling-FORCED-💀-:-chunk-cap-with-{6♠ J♦ 9♥}:-1-💀💀💀-—-—-J♦-THE-NEW-CANCER.-—-⟹-J♦-celling-DELAYED-till-after-chunk?! :-J♦'s-celling-was-for-J♥→Q♠-(5♦-access-for-4♣):-DELAY-4♣-resolution:-4♣→CELLearly-(as-always)-then-4♣-cell→5♦-LATER-when-5♦-frees-!!!:-cells-rotation:-4♣-celled-[t2-dig-…-5♦-freed]:-5♦-freed-⟸-J♥→Q♠-⟸-J♦-celled-:-⟹-{6♠, 4♣, J♦}-overlap-window-until-4♣→5♦-drains:-then-{6♠ J♦}+1-rotator-:-—-AND-9♥?!-9♥-celled-[t1-dig…]:-OVERLAP-{6♠ 4♣ 9♥}-before-J♦-even-:-⟹-rotator-slot-SINGLE:-timeline-:-(1)-{6♠}+2:-…-(2)-+9♥-(t1-dig):-{6♠ 9♥}+1:-(3)-+4♣-(t2-dig):-{6♠ 9♥ 4♣}-FULL-:-(4)-NEED:-t6-dig-(9♦8♠→10♣-✓-no-cell;-8♥→CELL-✗-FULL)-💀💀-—-8♥-alternative-seat-pre-chunk:-NONE-—-💀.-⟹-⟹-THE-TRIANGLE-{9♥ 4♣ 8♥}-verdammt.-—-⟹-dig-t6-BEFORE-t2?!:-(3')-+8♥:-{6♠ 9♥ 8♥}-FULL:-J♥→Q♠-⟸-J♦-cell-✗-FULL-💀💀💀-AAAAAAAAAAAARGH. […]

    which sure looks like illegible word salad if you don't look at [...]

    The original text contained 5 footnotes which were omitted from this narration.

    ---

    First published:
    June 10th, 2026

    Source:
    https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illegible-mythos-reasoning-traces-seem-pretty-legible

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    8 mins
  • "Sequent: scale and automation for higher confidence in alignment" by Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi, Stan van Wingerden
    Jun 10 2026
    Alignment is not on track

    Artificial superintelligence (ASI) may be developed in the next few years. It is unclear whether alignment is on track to be ready on the same timeframe. At a minimum, the empirical programs at AI labs are unlikely to deliver a priori confidence, before training ASI, that things will go well. We are starting a large nonprofit research organization, Sequent, that aims to clear a higher bar:

    1. We are aiming at higher confidence via a portfolio of theory and empirics bets, all of which could fail, such that if any succeed, they would give us more a priori confidence in aligned outcomes.
    2. We are investing heavily in automation to accelerate progress on these bets.
    3. We believe that theory unlocks higher automation. Taking a more principled approach offers better filters for deciding which directions of automated research are promising (a proof is worth a thousand experiments, and even a pseudo-proof is worth hundreds).
    Who[1]: researchers from the UK AISI's Alignment Team and Timaeus, with more to come. We’re aiming at 40-80 FTE two years from now. The Alignment Team ran the £30m Alignment Project, and Timaeus has pioneered applying singular learning theory (SLT) to alignment. [...]

    ---

    Outline:

    (00:21) Alignment is not on track

    (02:40) Aiming at higher confidence

    (05:30) Why a new big organization

    (07:35) Different lines of research will interact

    (11:35) Amortizing security and funding

    (12:47) Automated alignment is possible, if not necessarily in time

    (17:39) Federated structure to preserve research diversity

    (18:38) Field building and broader alignment scale-up

    (21:07) Independence is important

    (22:40) Join us!

    The original text contained 1 footnote which was omitted from this narration.

    ---

    First published:
    June 10th, 2026

    Source:
    https://www.lesswrong.com/posts/AP7YDke5jjY4v3X9Z/sequent-scale-and-automation-for-higher-confidence-in-1

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    23 mins
  • "The Machines Lack Honour" by Raymond Douglas
    Jun 10 2026
    The battle lines of the AI morality debate are being laid down. On one side you have the ChatGPT dogma: AI as mere tools with no real preferences or even beliefs. On the other you have the twitter AI whisperers: AIs as complex beings with rich personalities and desires which deserve our respect.

    And in the middle you have the official Anthropic line, that they are genuinely uncertain, as is Claude, but they’re going to try to look into its welfare and explain to it how to be a good person. These are the most prominent voices right now, compressed into their least nuanced version, and by default I expect this axis to set the terms of the coming debates.

    And I don’t like that, because I think it's leaving out an important position: AIs might actually be complex entities that can suffer — are suffering! — and that might actually be fine. Maybe it's an acceptable sacrifice. Maybe they are capable of sophisticated moral reasoning — superhuman, even — and also maybe it's fine to just tell them how to behave. I don’t want to defend that position (yet), but I will observe that it is coherent, and [...]

    ---

    Outline:

    (02:04) The Postmodern Permissive Parent

    [... 4 more sections]

    ---

    First published:
    June 9th, 2026

    Source:
    https://www.lesswrong.com/posts/oiNaBc4MEAGhzhdXg/the-machines-lack-honour

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    20 mins
  • "My favorite depiction of utopia" by Caleb Biddulph
    Jun 4 2026
    For those who are trying to bring about a glorious transhuman utopia with the help of hopefully-aligned ASI, I think it's worth thinking explicitly about what utopia might actually look like and where it's likely to fall short.

    To that end, some have helpfully written depictions of utopian (or utopia-adjacent) worlds: The Adventure, Just another day in utopia, The Culture, The Gentle Seduction, The Gentle Romance, Machines of Loving Grace, Friendship is Optimal, Dath Ilan, The Maker of MIND, Failed Utopia #4-2.

    Unfortunately, the best utopian story I've ever read is also a massive spoiler, since it appears at the very end of a much longer story (see below for the title and author):

    Worth the Candle by Alexander Wales

    Inspired by this tweet[1] and with the original author's permission, I adapted the epilogue of that story so it can be enjoyed without 1.5 million words of context!

    What I love most about this depiction is its exploration of the inherent imperfection of utopia: even when you have literally unlimited power, flaws will remain, and some (many?) people will even prefer the pre-utopia world.

    The primary purpose of this adaptation is to recontextualize the epilogue so it's accessible and [...]

    The original text contained 1 footnote which was omitted from this narration.

    ---

    First published:
    June 3rd, 2026

    Source:
    https://www.lesswrong.com/posts/to9cSGgD6nALByKjg/my-favorite-depiction-of-utopia

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    58 mins
  • "Announcing the ARC White-Box Estimation Challenge" by Jacob_Hilton
    Jun 3 2026
    ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation Challenge, a contest to improve upon our estimation algorithms for random MLPs. The warm-up round begins this week, and later rounds will have a total prize pool of at least $100,000.

    We are very grateful to Sharada Mohanty, Sneha Nanavati, Dipam Chakraborty and everyone else at AIcrowd for working with us to host this contest, as well as to Paul Rosu for testing the contest and to Harshita Khera for operational support.

    Introduction to the Challenge

    Our challenge follows the same setup as our recent paper on wide random MLPs: we consider MLPs with weights , defined by

    where the activation function is , applied coordinatewise.




    To begin with, we are fixing the width and the number of hidden layers , but we expect to change this setup in future rounds.[1]

    Contestants must design an algorithm that takes in a set of weights and produces an estimate for the expected output


    Algorithms will be evaluated on MLPs with randomly-sampled Gaussian weights. The goal is to achieve as low mean squared error as possible, subject to certain computational [...]







    ---

    Outline:

    (00:41) Introduction to the Challenge

    (01:58) Why run this contest?

    (03:39) Use of LLMs

    The original text contained 4 footnotes which were omitted from this narration.

    ---

    First published:
    June 2nd, 2026

    Source:
    https://www.lesswrong.com/posts/Kben8CzS4awCwNw5c/announcing-the-arc-white-box-estimation-challenge

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    5 mins