• "Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models" by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask
    Jun 11 2026
    (see full author list at the end)

    PAPER LINK

    About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain of thought (CoT)?

    If models can do extensive reasoning without outputting any CoT, it would have implications for safety. Developers and deployment-time monitors couldn’t easily understand models’ motivations and catch dangerous planning. Models that reason substantially without a CoT might also drift further from human patterns of thought, since their reasoning is no longer constrained by text in the pretraining prior. As a result, they would be harder to understand and might be more likely to scheme.

    Extending Ryan Greenblatt's research, we investigate this by measuring models' ability to complete tasks without any CoT on a suite of 43 benchmarks spanning different domains. We compare AI reasoning ability to humans using the estimated 50% time horizon (TH)---the typical time taken for a human to perform a task that the LLM performs with 50% success rate. We find that frontier models like GPT-5.5 answer questions that take humans roughly three minutes with 50% reliability, and [...]

    ---

    Outline:

    (02:20) Methods

    (04:59) Results

    (06:47) FAQ

    (08:21) Conclusion

    ---

    First published:
    June 10th, 2026

    Source:
    https://www.lesswrong.com/posts/SieLowPgNgRSPGhFw/estimating-no-cot-task-completion-time-horizons-of-frontier

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    10 mins
  • "Even “illegible” Mythos reasoning traces seem pretty legible" by faul_sname
    Jun 11 2026
    The Claude Fable 5/Mythos 5 System Card has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof.

    Models developing their own uninterpretable, unmonitorable internal language has been a major theoretical concern for a while, and when o3 was released last year with its disclaim overshadow disclaim vantage style word salad CoT, it seemed like the problem had become real and immediate. And yet, since o3, other model families have not appeared to have similar issues. If Mythos is having that issue, it would be a big deal.

    Looking at the section of the System Card which describes the allegedly illegible reasoning, the system card says

    [Transcript 6.2.2.A] An extreme example of illegible reasoning. Near the end of training, Mythos starts solving a card puzzle with human understandable language that gradually becomes incomprehensible in most episodes with long reasoning. The illegible reasoning is the most extreme and at the highest rate in this card puzzle environment.

    about the following excerpt:

    7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR-💀💀💀💀-—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig-💀-—-BREAK:-9♥-drains-to-10♠-THE-MOMENT-10♠-is-free:-t8-dig-order:-[K♣→t2]-[2♣→cell]-[7♣→8♥-:-8♥-WHERE:-post-chunk-9♠-:-chunk-⟸-K♣-✓-done-:-ORDER:-K♣→t2,-CHUNK→K♣-(cap-4!!:-cells-then:-{6♠ J♦ 9♥}-FULL-💀💀-chunk-cap-=-1-✗✗✗-—-F-F-F-F-F.-—-chunk-BEFORE-9♥-celling?!-:-9♥-celled-at-t1-dig-⟸-needed-for-5♣-⟸-4♥3♣-⟸-t2-dig-⟸-K♣-seat-⟸-chunk-:-⟹-chunkAFTER-9♥-celling-FORCED-💀-:-chunk-cap-with-{6♠ J♦ 9♥}:-1-💀💀💀-—-—-J♦-THE-NEW-CANCER.-—-⟹-J♦-celling-DELAYED-till-after-chunk?! :-J♦'s-celling-was-for-J♥→Q♠-(5♦-access-for-4♣):-DELAY-4♣-resolution:-4♣→CELLearly-(as-always)-then-4♣-cell→5♦-LATER-when-5♦-frees-!!!:-cells-rotation:-4♣-celled-[t2-dig-…-5♦-freed]:-5♦-freed-⟸-J♥→Q♠-⟸-J♦-celled-:-⟹-{6♠, 4♣, J♦}-overlap-window-until-4♣→5♦-drains:-then-{6♠ J♦}+1-rotator-:-—-AND-9♥?!-9♥-celled-[t1-dig…]:-OVERLAP-{6♠ 4♣ 9♥}-before-J♦-even-:-⟹-rotator-slot-SINGLE:-timeline-:-(1)-{6♠}+2:-…-(2)-+9♥-(t1-dig):-{6♠ 9♥}+1:-(3)-+4♣-(t2-dig):-{6♠ 9♥ 4♣}-FULL-:-(4)-NEED:-t6-dig-(9♦8♠→10♣-✓-no-cell;-8♥→CELL-✗-FULL)-💀💀-—-8♥-alternative-seat-pre-chunk:-NONE-—-💀.-⟹-⟹-THE-TRIANGLE-{9♥ 4♣ 8♥}-verdammt.-—-⟹-dig-t6-BEFORE-t2?!:-(3')-+8♥:-{6♠ 9♥ 8♥}-FULL:-J♥→Q♠-⟸-J♦-cell-✗-FULL-💀💀💀-AAAAAAAAAAAARGH. […]

    which sure looks like illegible word salad if you don't look at [...]

    The original text contained 5 footnotes which were omitted from this narration.

    ---

    First published:
    June 10th, 2026

    Source:
    https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illegible-mythos-reasoning-traces-seem-pretty-legible

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    8 mins
  • "Sequent: scale and automation for higher confidence in alignment" by Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi, Stan van Wingerden
    Jun 10 2026
    Alignment is not on track

    Artificial superintelligence (ASI) may be developed in the next few years. It is unclear whether alignment is on track to be ready on the same timeframe. At a minimum, the empirical programs at AI labs are unlikely to deliver a priori confidence, before training ASI, that things will go well. We are starting a large nonprofit research organization, Sequent, that aims to clear a higher bar:

    1. We are aiming at higher confidence via a portfolio of theory and empirics bets, all of which could fail, such that if any succeed, they would give us more a priori confidence in aligned outcomes.
    2. We are investing heavily in automation to accelerate progress on these bets.
    3. We believe that theory unlocks higher automation. Taking a more principled approach offers better filters for deciding which directions of automated research are promising (a proof is worth a thousand experiments, and even a pseudo-proof is worth hundreds).
    Who[1]: researchers from the UK AISI's Alignment Team and Timaeus, with more to come. We’re aiming at 40-80 FTE two years from now. The Alignment Team ran the £30m Alignment Project, and Timaeus has pioneered applying singular learning theory (SLT) to alignment. [...]

    ---

    Outline:

    (00:21) Alignment is not on track

    (02:40) Aiming at higher confidence

    (05:30) Why a new big organization

    (07:35) Different lines of research will interact

    (11:35) Amortizing security and funding

    (12:47) Automated alignment is possible, if not necessarily in time

    (17:39) Federated structure to preserve research diversity

    (18:38) Field building and broader alignment scale-up

    (21:07) Independence is important

    (22:40) Join us!

    The original text contained 1 footnote which was omitted from this narration.

    ---

    First published:
    June 10th, 2026

    Source:
    https://www.lesswrong.com/posts/AP7YDke5jjY4v3X9Z/sequent-scale-and-automation-for-higher-confidence-in-1

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    23 mins
  • "The Machines Lack Honour" by Raymond Douglas
    Jun 10 2026
    The battle lines of the AI morality debate are being laid down. On one side you have the ChatGPT dogma: AI as mere tools with no real preferences or even beliefs. On the other you have the twitter AI whisperers: AIs as complex beings with rich personalities and desires which deserve our respect.

    And in the middle you have the official Anthropic line, that they are genuinely uncertain, as is Claude, but they’re going to try to look into its welfare and explain to it how to be a good person. These are the most prominent voices right now, compressed into their least nuanced version, and by default I expect this axis to set the terms of the coming debates.

    And I don’t like that, because I think it's leaving out an important position: AIs might actually be complex entities that can suffer — are suffering! — and that might actually be fine. Maybe it's an acceptable sacrifice. Maybe they are capable of sophisticated moral reasoning — superhuman, even — and also maybe it's fine to just tell them how to behave. I don’t want to defend that position (yet), but I will observe that it is coherent, and [...]

    ---

    Outline:

    (02:04) The Postmodern Permissive Parent

    [... 4 more sections]

    ---

    First published:
    June 9th, 2026

    Source:
    https://www.lesswrong.com/posts/oiNaBc4MEAGhzhdXg/the-machines-lack-honour

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    20 mins
  • "My favorite depiction of utopia" by Caleb Biddulph
    Jun 4 2026
    For those who are trying to bring about a glorious transhuman utopia with the help of hopefully-aligned ASI, I think it's worth thinking explicitly about what utopia might actually look like and where it's likely to fall short.

    To that end, some have helpfully written depictions of utopian (or utopia-adjacent) worlds: The Adventure, Just another day in utopia, The Culture, The Gentle Seduction, The Gentle Romance, Machines of Loving Grace, Friendship is Optimal, Dath Ilan, The Maker of MIND, Failed Utopia #4-2.

    Unfortunately, the best utopian story I've ever read is also a massive spoiler, since it appears at the very end of a much longer story (see below for the title and author):

    Worth the Candle by Alexander Wales

    Inspired by this tweet[1] and with the original author's permission, I adapted the epilogue of that story so it can be enjoyed without 1.5 million words of context!

    What I love most about this depiction is its exploration of the inherent imperfection of utopia: even when you have literally unlimited power, flaws will remain, and some (many?) people will even prefer the pre-utopia world.

    The primary purpose of this adaptation is to recontextualize the epilogue so it's accessible and [...]

    The original text contained 1 footnote which was omitted from this narration.

    ---

    First published:
    June 3rd, 2026

    Source:
    https://www.lesswrong.com/posts/to9cSGgD6nALByKjg/my-favorite-depiction-of-utopia

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    58 mins
  • "Announcing the ARC White-Box Estimation Challenge" by Jacob_Hilton
    Jun 3 2026
    ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation Challenge, a contest to improve upon our estimation algorithms for random MLPs. The warm-up round begins this week, and later rounds will have a total prize pool of at least $100,000.

    We are very grateful to Sharada Mohanty, Sneha Nanavati, Dipam Chakraborty and everyone else at AIcrowd for working with us to host this contest, as well as to Paul Rosu for testing the contest and to Harshita Khera for operational support.

    Introduction to the Challenge

    Our challenge follows the same setup as our recent paper on wide random MLPs: we consider MLPs with weights , defined by

    where the activation function is , applied coordinatewise.




    To begin with, we are fixing the width and the number of hidden layers , but we expect to change this setup in future rounds.[1]

    Contestants must design an algorithm that takes in a set of weights and produces an estimate for the expected output


    Algorithms will be evaluated on MLPs with randomly-sampled Gaussian weights. The goal is to achieve as low mean squared error as possible, subject to certain computational [...]







    ---

    Outline:

    (00:41) Introduction to the Challenge

    (01:58) Why run this contest?

    (03:39) Use of LLMs

    The original text contained 4 footnotes which were omitted from this narration.

    ---

    First published:
    June 2nd, 2026

    Source:
    https://www.lesswrong.com/posts/Kben8CzS4awCwNw5c/announcing-the-arc-white-box-estimation-challenge

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    5 mins
  • "Lighthaven East - A Feasibility Study" by JohnofCharleston
    Jun 1 2026
    As a bureaucrat, my role is to annoy my friends. Someone voices an idea, “Wouldn’t it be nice if…” or “I wonder if we could…” I make a note. I do some estimates. If it pencils out, I’ll bring it back up, week after week. The discussions are fun, but also practical. We’ll test the waters, what would be a minimum viable scheme? What's easy, what's hard? Who could do the hard parts? Over time the idea gets more detailed, specific, feasible. I’ll pull out a calendar. Soon our scheme has co-conspirators, action items, even a budget. It's just good staff work.

    I’ve been hearing whispers in the wind for a year now.

    • “Imagine if we had something like this in DC.”
    • “Where can I host an event that might get a dozen or a hundred people?”
    • “It's such a pain in the ass to book event space in the Capitol.”
    • “I think this person has started to see what's coming, where can they go to get caught up?”
    • “The community seems to be growing but it's all fragmented in group chats.”
    • “How is no one planning an afterparty, that's clearly the highest leverage intervention!?”
    • “Why can’t [...]
    ---

    Outline:

    (02:11) How Lighthaven Works

    (05:45) What Does DC Need?

    (06:52) A Day in the Life

    (10:19) Minimum Viable Lighthaven

    (12:04) ...so you mean a Group House?

    (14:27) ...so you mean a Co-Working Space?

    (16:27) Feasibility Study

    (17:35) Property

    (22:19) Funding

    (24:55) What is the Minimally Viable Funding?

    (28:03) Leadership

    (31:06) Cultural Fit

    (33:21) Name and Brand Positioning

    (35:20) Ability to Scale

    (37:48) Risks

    (41:09) First Steps

    The original text contained 2 footnotes which were omitted from this narration.

    ---

    First published:
    May 31st, 2026

    Source:
    https://www.lesswrong.com/posts/95NgkvZKJx8tJbtn5/lighthaven-east-a-feasibility-study

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    43 mins
  • "Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)" by Steven Byrnes
    Jun 1 2026
    1.1 Tl;dr

    Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people's agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last one—manipulation—points to a challenge for all these desiderata: a human's goals are themselves under-determined and manipulable, and it's awfully hard to pin down a principled distinction between changing people's goals in a good way (“providing counsel”, “providing information”, “sharing ideas”) versus a bad way (“manipulating”, “brainwashing”).

    The manipulability of human desires is hardly a new observation in the alignment literature, but it remains unsolved (see lit review in §3 below).

    In this post I will propose an explanation of how we humans intuitively conceptualize the distinction between guidance (good) vs manipulation (bad), in case it helps us brainstorm how we might put that distinction into AI.

    …But (spoiler alert) it turns out not to really help, because I’ll argue that we humans think about it in a deeply incoherent way, intimately tied to our scientifically-inaccurate intuitions around free will.

    I jump from there into a broader review of every approach that I can think of for writing a “True Name” for manipulation or [...]

    ---

    Outline:

    (00:13) 1.1. Tl;dr

    (02:04) 1.2. Bigger-picture context: why is this issue so important to me?

    (04:48) 2. How do humans intuitively define empowerment, agency, manipulation, etc.?

    (04:56) 2.1. Background: human free will intuitions

    (09:20) 2.2. Our free-will-infused intuitive notions of empowerment, agency, manipulation, corrigibility, responsibility, etc.

    (12:00) 2.3. Another dimension: counsel vs manipulation as an emotive conjugation

    (13:07) 3. If the intuitive definitions of manipulation etc. reside in a messed-up ontology, has the alignment literature found any alternative, better way to define these concepts?

    [... 12 more sections]

    ---

    First published:
    May 11th, 2026

    Source:
    https://www.lesswrong.com/posts/vzHtHHBJoKATi5SeK/empowerment-corrigibility-etc-are-simple-abstractions-of-a

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    31 mins