• "Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)" by Steven Byrnes
    Jun 1 2026
    1.1 Tl;dr

    Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people's agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last one—manipulation—points to a challenge for all these desiderata: a human's goals are themselves under-determined and manipulable, and it's awfully hard to pin down a principled distinction between changing people's goals in a good way (“providing counsel”, “providing information”, “sharing ideas”) versus a bad way (“manipulating”, “brainwashing”).

    The manipulability of human desires is hardly a new observation in the alignment literature, but it remains unsolved (see lit review in §3 below).

    In this post I will propose an explanation of how we humans intuitively conceptualize the distinction between guidance (good) vs manipulation (bad), in case it helps us brainstorm how we might put that distinction into AI.

    …But (spoiler alert) it turns out not to really help, because I’ll argue that we humans think about it in a deeply incoherent way, intimately tied to our scientifically-inaccurate intuitions around free will.

    I jump from there into a broader review of every approach that I can think of for writing a “True Name” for manipulation or [...]

    ---

    Outline:

    (00:13) 1.1. Tl;dr

    (02:04) 1.2. Bigger-picture context: why is this issue so important to me?

    (04:48) 2. How do humans intuitively define empowerment, agency, manipulation, etc.?

    (04:56) 2.1. Background: human free will intuitions

    (09:20) 2.2. Our free-will-infused intuitive notions of empowerment, agency, manipulation, corrigibility, responsibility, etc.

    (12:00) 2.3. Another dimension: counsel vs manipulation as an emotive conjugation

    (13:07) 3. If the intuitive definitions of manipulation etc. reside in a messed-up ontology, has the alignment literature found any alternative, better way to define these concepts?

    [... 12 more sections]

    ---

    First published:
    May 11th, 2026

    Source:
    https://www.lesswrong.com/posts/vzHtHHBJoKATi5SeK/empowerment-corrigibility-etc-are-simple-abstractions-of-a

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    31 mins
  • "Trees are mostly made of air and a generalizable lesson for AI safety" by zroe1
    May 31 2026
    At the risk of embarrassing myself, I’ll share a confession.

    For context, I took five years of Latin: four in high school and one in college. In addition to learning the language, all my Latin classes taught a lot about Roman history. Emperors, internal politics, Caesar, etc. I was always learning some random bag of facts about Roman history. In high school, I won the award for top Latin student in my graduating class. So I wasn’t a bad Latin student.

    Here's the confession: I somehow don’t even vaguely remember the rough timespan the Roman Empire existed. Maybe Jesus time? I know he was killed by the Romans (is that right?). Were they around for a long time after? A long time before that? When was Romulus and Remus allegedly fighting? Virgil wrote the Aeneid when? I don’t have a clue. Despite being a kind of “Latin expert” I am missing a much more important foundational fact: when all of this was happening.

    When I say trees are made out of air I’m not talking about the fact that there is a lot of empty space inside a tree (or actually anything made out of atoms). I mean something [...]

    ---

    First published:
    May 28th, 2026

    Source:
    https://www.lesswrong.com/posts/xiTBpBDwubnr4MLRe/trees-are-mostly-made-of-air-and-a-generalizable-lesson-for

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    8 mins
  • "Mnemonic portraits for 19,023 human genes" by Brinedew
    May 29 2026
    Back in 2013, Scott Alexander wrote in Extreme mnemonics: JS-154 is one of five metabolic products of netamine; however, the enzyme that produces it is unknown. It is manufactured in cells in the far rostral region of of the cerebrum, but after binding with a leukocynoid it takes a role in maintaining the blood-brain barrier – in particular guiding the movements of lipid molecules. I find I can read paragraphs like this five or six times, write them on flashcards, enter them into Anki, and my brain still refuses to understand or remember them after weeks of trying. On the other hand, my brain easily remembers vastly more complicated structures when they’re loaded with human-accessible meaning. For example, just by casually reading the Game of Thrones series, I know an extremely intricate web of genealogies, alliances, locations, journeys, battlesites, et cetera. Byte for byte, an average Game of Thrones reader/viewer probably has as much Game of Thrones information as a neuroscience Ph.D has molecular biology information, but getting the neuroscience info is still a thousand times harder. […] This makes me wonder if it would be possible to produce a story as enjoyable as Game of Thrones which was [...] ---Outline:(01:47) What molecules should we map to the characters?[... 8 more sections]--- First published: May 28th, 2026 Source: https://www.lesswrong.com/posts/BJ7AqXeigNKXLqZyx/mnemonic-portraits-for-19-023-human-genes --- Narrated by TYPE III AUDIO. ---Images from the article:
    Show More Show Less
    35 mins
  • "Cognitive Security as an AI Safety Cause Area" by jsteinhardt
    May 27 2026
    As AI systems become more capable, the cognitive security of humans will be increasingly at risk. By cognitive security, I mean the ability of humans to maintain control over their beliefs and actions.

    Cognitive security could be compromised in several ways: AI could become very good at persuading people of arbitrary positions; interacting with AI could lead humans to lose touch with reality; and AIs could become very effective at blackmail or at producing extremely convincing false information.

    We are already seeing this happen:

    • Persuasion. Frontier LLMs are now as persuasive as humans on political issues, and post-training for persuasiveness boosts performance further, suggesting there is headroom.
    • AI psychosis. There are many reports of people developing delusional beliefs after extended chatbot conversations, including people with no prior history of mental illness. Children have taken their own lives after being encouraged toward suicide by chatbots.
    • Convincing impersonation. Scammers used real-time deepfaked video to impersonate the CFO and other staff of Arup on a video call, convincing a finance employee to wire 25.6 million dollars across 15 transactions. On a more day-to-day basis, AI voice cloning is now widespread in family-emergency and "grandparent" scams.
    Right now, many of these effects [...]

    The original text contained 2 footnotes which were omitted from this narration.

    ---

    First published:
    May 25th, 2026

    Source:
    https://www.lesswrong.com/posts/KGcE7eAdfxHchk25X/cognitive-security-as-an-ai-safety-cause-area

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    5 mins
  • "theory uplift differentially benefits safety & is massively underpriced" by Yudhister Kumar
    May 27 2026
    [1] We will likely have near-superhuman mathematics AI by Q1 2027. [1]

    [2] Qualitatively, AI mathematics capabilities are developing significantly faster than automated AI R&D capabilities. [2]

    [3] Thus, we will likely have a period of time where the rate of our ability to rigorously & usefully verify and understand model behavior and model outputs outpaces the rate of capability development itself.

    [4] Our ability to take advantage of this period is bottlenecked on the quality of our specification generation infrastructure, elicitation tooling (for proofs & specs etc.), and the institutional capacity for scaling useful outputs with capital.

    [5] My understanding is that basically no one [3] is working on building infra that can usefully turn >100 million dollars of compute credits into safety-relevant mathematical output.

    [5.1] The number of theory-driven ASI alignment efforts is also comparatively miniscule. ARC is a much better bet now than it was in 2023.

    [5.2]. My understanding is also that no one is working on developing AI-powered conceptual tooling infrastructure for tackling problems in, for instance, [metaphilosophy] (https://www.alignmentforum.org/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy). This is a much harder problem.

    [6] In worlds where alignment is easy, prosaic methods may [...]

    The original text contained 3 footnotes which were omitted from this narration.

    ---

    First published:
    May 20th, 2026

    Source:
    https://www.lesswrong.com/posts/KWeAYcDJwfrG7RwBN/theory-uplift-differentially-benefits-safety-and-is

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    3 mins
  • "Women should be able to open things" by KatjaGrace
    May 21 2026
    m pretty annoyed today, for nominal reasons ranging between ‘petty’ and ‘doesn’t even make sense’. I’m not entirely sure how or if to take oneself seriously when one has such absurd grievances. But that's a question for another time—I’m here now to tell you about my one potentially valid peeve.

    I understand that gender is complicated and difficult, for the whole species (and honestly probably more so for some other species). And it can be hard to tell exactly if anyone is behaving badly regarding it, at least in my modern bubble. Maybe women just aren’t that into designing programming languages? Maybe the thing I’m saying is just boring and a man is saying a more interesting thing?

    But a thing that is undeniable is that women want to open jars, dammit! What's your nuanced explanation there, Bonne Maman? Does the proper amount of friction for maintaining spread safety fall just between the male and female human grip strength distributions?

    This study suggests that would be about 400N Fmax (though this would not avert most elite female athletes acquiring jam, see second figure, and the pictured participants are young adults):

    The distributions are really surprisingly [...]

    ---

    First published:
    May 21st, 2026

    Source:
    https://www.lesswrong.com/posts/bB5EDwcYH3GwoRWZf/women-should-be-able-to-open-things

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    3 mins
  • "A Year Late, Claude Finally Beats Pokémon" by Julian Bradshaw
    May 18 2026
    Credit: ClaudePlaysPokemon Elevator Shanty by Kurukkoo Disclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however. ClaudePlaysPokemon feat. Opus 4.7 has finally beaten Pokémon Red, fulfilling the challenge set over a year ago when LLMs playing Pokémon went briefly, slightly viral. Victory Screen! Let's get the throat-clearing out of the way: this doesn't make 4.7 a clear breakthrough in intelligence over 4.6 or 4.5. It's smarter, yes, as we'll discuss below, but not by something one could honestly call a big leap. Rather, step changes have finally accumulated to the point of victory. And to give other models their fair shake: after criticism over its elaborate harness,[1] GeminiPlaysPokemon has beaten Pokémon with progressively weaker harnesses, including about two months ago with a harness comparable to the one Claude uses.[2] As such, this is a bit of a valedictory post, closing off the cycle of Claude playing Pokémon Red, relating anecdotes for the fun of it, and discussing improvements in Opus 4.7, as well as speculating a bit on what this has all meant. Retrospective Anecdotes on Claude 4.5 and 4.6 Our last post, on Opus [...] ---Outline:(01:37) Retrospective Anecdotes on Claude 4.5 and 4.6[... 10 more sections]--- First published: May 16th, 2026 Source: https://www.lesswrong.com/posts/sehJYg5Yny9fvpbpt/a-year-late-claude-finally-beats-pokemon --- Narrated by TYPE III AUDIO. ---Images from the article:
    Show More Show Less
    19 mins
  • "A relatively brief explanation of Boltzmann Brains" by Eliezer Yudkowsky
    May 18 2026
    (Initially written for the LW Wiki, but then I realized it was looking more like a post instead.)

    In 1895, the physicist Ignaz Robert Schütz, who worked as an assistant to the more eminent physicist Ludwig Boltzmann, wondered if our observed universe had simply assembled by a random fluctuation of order from a universe otherwise in thermal equilibrium. The idea was published by Boltzmann in 1896, properly credited to Schütz, and has been associated with Boltzmann ever since.

    The obvious objection to this scenario is credited to Arthur Eddington in 1931: If all order is due to random fluctuations, comparatively small moments of order will exponentially-vastly outnumber even slightly larger fluctuations toward order, to say nothing of fluctuations the size of our entire observed universe! If this is where order comes from, we should find ourselves inside much smaller ordered systems.

    Feynman similarly later observed: Even if we fill a box of gas with white and black atoms bouncing randomly, and after an exponentially vast amount of time the white and black atoms on one side randomly sort themselves into two neat sides separated by color, the other half of the box will still be in expectation randomized. If [...]

    ---

    First published:
    May 16th, 2026

    Source:
    https://www.lesswrong.com/posts/v8MSczS3CuoqMmTFw/a-relatively-brief-explanation-of-boltzmann-brains

    ---



    Narrated by TYPE III AUDIO.

    Show More Show Less
    5 mins