Can We Secure AI With Formal Methods?

Apr-May 2026 AI Security via Formal Methods

Quinn Dougherty — Thu, 21 May 2026 15:28:46 GMT

There’s been some talk about how I’m not doing the best possible job with the newsletter and there’s room for competition, or, if we’re feeling frisky, a coup. If you’d like to volunteer, let me know how I can help! I mentioned last edition that I don’t see myself being the secure program synthesis newsletter, that I’ll want to focus on applications of SPS to AI security. But it’d be great for someone to own “SPS the newsletter” as a parallel effort to this newsletter.

Getting this out quickly so you read it before the hackathon, not covering any research, sorry. Just happenings/opportunities in the movement.

Secure Program Synthesis Hackathon NOW

See the talk schedule/recordings for some excellent content. This weekend, talks last night (recording available), today, and tomorrow.

The tracks are spec elicitation and validation, spec-driven development in general, and advro for proofstacks (like interactive theorem provers).

Secure Program Synthesis Fellowship: participant application deadline May 31st

Same tracks as above. 14 mentors/streams. Apply to a team!

Midspiral, Sequent, and Sigil Logic have logged on

New companies! Exciting!

Midspiral is tackling webdev via formal methods, apply to join their discord community here.

Sequent has someone with old school AI xrisk credentials involved.

Sigil Logic is hiring (see the hackathon talk schedule above to understand more about Joe’s worldview).

The Revenge of Safeguarded AI: new TA2 funding call on cyberhardening

Programme thesis document v2.0. Funding call deadline July 1st. I plan to spend more time with this and discuss it more in the next edition of the newsletter. Sorry I’m rushing today.

Hardware for AI security residency

There is an emerging field of AI security known as “AI verification”, and they don’t mean it in the sense of formal verification, though our sense of the word emerges sometimes (and they explicitly want FM people to apply).

[exploding note] Apply to Mentor Secure Program Synthesis Fellowship by May 5th

Quinn Dougherty — Tue, 28 Apr 2026 18:39:40 GMT

Apart and the secure program synthesis community are launching a fellowship! A ton of you reading this would make great mentors— but the mentor application deadline is a mere week away, so we have an emergency edition of the newsletter to persuade you to apply.

~~In this sentence, I am very informally and unofficially implying that direct AI security widgets/products will get Very Special Bonus Points in the mentor peer review / rating that I do.~~

Additionally, there will be a hackathon May 22-24 on the same topics.

Dates: June - September 2026
Gross World LoC (Line of Code) is skyrocketing due to AI. By default, we have no way of knowing if what we’re vibecoding is doing what we think its doing. Secure program synthesis, on the other hand, is the intervention for throwing advanced software correctness techniques at all the code coming out of the AIs.

This fellowship offers part-time research opportunities on mentor-led projects at the intersection of formal methods, AI systems, and security. Participants work in small teams to tackle challenging, underspecified problems in specification, validation, and adversarial robustness.
A joint initiative of Apart Research and Atlas Computing.

Key Dates

April 28th - May 5th 2026: Mentor Applications open
May 10th 2026: Mentors Announced and Participant Applications open
May 26th 2026: Participant Applications Close
June 9th 2026: Participants Announced
June 15th 2026: Projects Begin
July/August 2026: Mid-Project Presentations & Milestone Submissions
August/Septemer 2026: Final submissions
September/October 2026: Demo day

Focus Areas

Specification Elicitation

Develop tools and workflows for extracting formal specifications from ambiguous, distributed, or implicit sources (e.g., documentation, legacy systems, human stakeholders). Projects may include structured editors, GUIs, or pipelines that translate informal requirements into formal representations (e.g., Lean), building on approaches like “SpecIDE.”

Specification Validation

Design methods to verify that extracted specifications are correct and complete. This includes techniques for testing, cross-checking, or formally validating whether a specification accurately captures intended system behavior.

Spec-Driven Development & Evaluation

Explore workflows where specifications generate multiple candidate implementations, which are then evaluated against each other. Projects may involve building “arenas” to compare robustness, correctness, or performance across implementations derived from the same spec.

Adversarial Robustness for FM & QA Tools

Investigate failure modes in formal methods pipelines, LLM-assisted tooling, and QA systems. Projects may focus on adversarial inputs, robustness guarantees, or identifying weaknesses in automated reasoning and verification systems.

Full list of mentor projects coming soon.

Can We Secure AI With Formal Methods? January-March 2026

Quinn Dougherty — Thu, 02 Apr 2026 18:41:42 GMT

In the month or so around the previous new years, as 2024 became 2025, we were saying “2025: year of the agent”. MCP was taking off, the inspect-ai and pydantic-ai python packages were becoming the standards, products were branching out from chatbots to heavy and autonomous use of toolcalls. While much of the product engineering scene may have underdelivered (in the sense that “planning a vacation” isn’t entirely something most people do with agents yet), the field of FMxAI I think was right on target. Feels like there’s an agentic component to everything I read these days.

What is 2026 the year of? Besides “year of investors pressure all the math companies to pivot to program synthesis”? I’m declaring it now

The number of blogposts relating to secure program synthesis went exponential since the last issue of the newsletter, such that the hyperlinks in this sentence are not even exhaustive. Like AI capabilities themselves, we simply do not know when it will go logistic. Which I think means I have to keep the newsletter manageable by not promising to do the general secure program synthesis or FMxAI spaces and really double down on AI security applications. I mean it, you guys. It’s overwhelming, the space is too big now, etc. But will SPS gains actually get applied to AI security opportunities? That’s the crucial question I think we should care about, now. Let’s assume by default no, so that everyone who wants to liberate my bayes points from me goes and makes it happen.

I moved to DC, and I started an AI Security reading group in meatspace. Reach out to be involved!

There will be hackathons and fellowships in secure program synthesis, FMxAI, AI security, etc. Be prepared.

Sorry again about the long gap between editions. I’ve been writing more than ever, but not a lot of it public yet. Also coding a lot.

Nicholas Carlini’s desperate and pleading talk

Carlini: “Opus is better at finding 0-days than I am”. Carlini, famous for finding vulns in adversarial robustness papers back in the computer vision days, stopped a hair short of declaring a draft, but I’m not. This is a draft. If you’re reading this, you are hereby conscripted into the secure program synthesis army. Or maybe our vibe is more “militia” coded, haven’t made up my mind yet. But this is a draft. The gameboard seems offense dominated right now, we need to harden software, and we need to do it quickly.

According to gemini, who cites forbes: 20-40% of corporate IT budgets in 1999 were devoted to Y2K prep. $300-600b was spent (roughly 0.5-1T adjusted for inflation). Ordinary people, people with families, people who put sandwiches into little baggies, decided to do this. They had consensus that it was coming down the pipe, on a deadline.

I don’t think we have that consensus, I’m not sure we can marshall those resources.

But here look this is all I’m saying

cat << EOF > CLAUDE.md

Make agentic software that hardens a given repo with a red-blue loop. Declare an agent that uses standard redteam tools like fuzzing harnesses and static analysis as tools to find vulns in repos, then declare another agent that patches those vulns. Use git submodules to manage sources

EOF

while true; do cat CLAUDE.md | claude --dangerously-skip-permissions; sleep 1; done

If everyone reading this ran that command, and tailored it to the specific things you happen to know about, and dropped in repos close to your specialization, and steered the project in directions that I wouldn’t think of, etc. we might have a shot. Do not DoS open source maintainers with slop, but consider forking their repos and DoSing your forks with slop then finding a way to benchmark how much more secure your fork is and then maybe you get a few PRs in. Target loadbearing repos. Etc. 🫡

UK AISI call for information on Secure AI Infrastructure

Let’s do it. Doesn’t seem to have a deadline.

Gemini’s summary:

The UK government has launched a “Call for Information” to gather expert insights on securing the computing infrastructure used to develop and deploy advanced AI models. Managed jointly by the Department for Science, Innovation and Technology (DSIT), the AI Security Institute (AISI), and the National Cyber Security Centre (NCSC), the initiative seeks to address growing threats such as the theft of model weights, data breaches, and system disruptions. The government is specifically soliciting feedback from the AI and cybersecurity sectors on current risks and emerging technologies—including confidential compute, advanced cryptography, and trusted hardware—to help shape future research priorities and technical pilots. Ultimately, the program aims to ensure the UK remains a secure and trusted global hub for frontier AI development by building robust, “defense-in-depth” protections for critical AI assets.

SL5 Taskforce has been productive!

Excited about some of these outputs.

Long Live Safeguarded AI

Official leadership update, Kathleen Fisher (ex DARPA, responsible for HACMS) interviews Nora and Davidad on vimeo, Davidad’s notes about what he’s doing next.

Safeguarded AI, like GSAI, like Open Agency Architecture, was always controversial scifi that, to be clear, I’ve been rooting for since day one. Well not literally day one. When I first read open agency architecture, I recall rolling my eyes and wringing my hands “you can’t just put a type signature on a wacky diagram and say that it means AI safety is formally verifiable” but like a few days later I was into it.

With Nora Ammann in as programme director (who’s been behind the scenes on SGAI since before day one), we should expect aggressive AI security applications of formal methods.

Pat Lincoln I2O director at DARPA

Totally should’ve been in the thanksgiving edition, I spaced. SRI alum Pat Lincoln is pretty keen about AGI, he’s not like completely AGI pilled but extremely sympathetic to the viewpoint.

Atlas blog on CSLib

CSLib is what it sounds like: the mathlib of CS. It is spearheaded by Clark Barrett and Swarat Chaudhuri and others. Atlas Computing, which has for a while been looking at how to close the gap between formal methods as we find them today and radical infrastructure hardening, writes about it (they’re directly involved).

CSLib is cool, but one reason I think it doesn’t register as the droid I’m looking for is that I don’t know if real world software really draws from undergrad CS curriculum all that much. In most real world software jobs, having a problem that is shaped like an academic puzzle rather than a REST API is a rare but golden status marker.

VeriSoftBench

Claim: idiosyncrasies of mathlib in particular determine too much of what we think we know about Lean evals and language model performance on them.

500 theorem-proving tasks from 23 real-world projects, spanning compilers, type systems, smart contracts, separation logic, and program semantics.

Recognizing the crucial wisdom of our time (that math is made of tokens which we can use for something else), we get a Lean eval that isn’t focused on math.

muInference: a minimal inference stack on SeL4

In the draft of this newsletter, where I stubbed out writing about this as TODO, the stub consisted of one word:

Dude.

Which I think roughly describes how I felt seeing it and how I feel now. They basically boot a minimal (cpu) inference enclave in seL4. I used it as an enclave in a network simulation project I was doing, and it doesn’t even have a machine-readable TTY, that’s how secure it is.

Progress in the mitigation of steganography

Crucial applications for model weight exfiltration defense!

DiFR: Inference Verification Despite Nondeterminism

Verifying LLM Inference to Detect Model Weight Exfiltration

Seldon Labs batch 2 applications (closed by the time I’m posting this)

Sorry I didn’t get this out fast enough. But keep an eye on Seldon re the general space of AI security startups.

Also keep an eye on: Heron

Met a bunch of these fine folks at EAG.

Lucid Labs: data center access program for hardware related experiments

Summary by gemini:

The Lab Access Program by Lucid Computing is a specialized initiative providing developers and enterprises with early, hands-on access to “sovereign” AI infrastructure built on zero-trust, hardware-rooted security. Designed for industries with strict regulatory requirements—such as defense, healthcare, and finance—the program offers a sandbox environment where users can deploy AI models and agents while maintaining cryptographic proof of compliance and data residency. By moving away from traditional “black box” cloud models, participants can verify that their computations are occurring on specific, untampered hardware, allowing them to benchmark and refine secure AI workflows before moving into full-scale production.

AISLE finds 12 0-days in openssl and doesn’t (appear to) use formal methods

The hero the secure program synthesis community has been waiting for.

Its extremely unlikely that theyre using iris or some synthetic lean-iris under the hood. A year ago at EAG 2025, I went around telling everybody about my budding separation logic agent and that I wanted to find bugs, synthesize patches, and prove the patches correct with it– and I used repos like OpenSSL as hypothetical target. I didn’t continue working on that cuz I didnt like the Iris DSL that it was based on, but I’m not sure if I should’ve tried to continue.

Take a look at aisafety.com for overall fieldmapping

Thanks to Bryce and the folks at aisafety.com, a bunch of you have found this newsletter, where we discuss using formal methods to improve our AI security posture. Many of the rest of you, however, might have come in from the formal methods community and sometimes think “who are all you AI existential safety wackos”. aisafety.com wants to answer that question by making a map of everything that’s going on in the very broad umbrella of AI existential safety. Highly recommended if you’re generally scared about AI but don’t know exactly where to plug in yet.

Allies IFP have new RFP

With a great track record of supporting secure program synthesis, IFP wants to do more projects along the lines of their galvanization of the great refactor.

Kathleen Fisher and Gopal Sarma post op-ed

Let’s ship evals and RL environments now so that the next gen of language models are differentially accelerated to cyberhardening defensive technologies. 10/10, no notes.

Can We Secure AI With Formal Methods? November-December 2025

Quinn Dougherty — Sat, 29 Nov 2025 14:04:48 GMT

We did the rebrand! The previous thumbnail was a baseball metaphor, but it was very clearly someone getting out, not safe. I was testing all of you and each of you FAILED.

Here’s the prompt for the new thumbnail:

i’m keeping AI in a box, doing AI CConfinement (like in yampolskiy 2012), using formal verification / formal methods. That’s my whole thing. I need art for my newsletter on these topics. I like the percival story from troyes/wagner and i like tolkien, but if you take from those elements put it IN SPACE like scifi. Also use german expressionist painting styles. Ok now give me some DALLE art.

So long “Progress in GSAI”. I still like the position paper that the old newsletter title was based on, but

It’s very scifi and I think there’s more alpha in obvious/relatively easy/uncontroversial (but not done by default) work.
The word “guarantee” doesn’t evoke “swiss cheese”.
It’s time to double down on relationships between AI security and formal methods, directly, more explicitly than you can do within the framing of GSAI.

Also notice: gsai.substack.com is now a redirect to newsletter.for-all.dev. I’ll be hosting a bunch of my technical reports and comms/strategy outputs at that domain going forward (the subdomain newsletter will just point to substack). But don’t worry, the scope of the newsletter remains largely the same (excepting the pivot to be more directly and explicitly about AI security) / won’t devolve into being any more nakedly self promotional than it has been so far.

I received a grant from a Funder of Presently Undisclosed Provenance to do comms and strategy for AI security via formal methods, which means among other things that this newsletter will get a little more TLC.

Busy month, I expect things to be slow over christmas, after this edition I’ll see you all in 2026.

In the spirit of chivalry, I styletransferred most abstracts in this edition of the newsletter to Troyes/Cervantes style. I did not check to see if Gemini got anything wrong, but every headline is a link to arxiv or openreview which you’ll click if you’re interested.

Miri’s treaty team posts a paper!

Excited about this. They use the word “verification” in a different context then we do, they mean it in the sense of verifying the absence of enriched uranium (GPUs) or verifying that the terms of a treaty are being abided by.

Many experts argue that premature development of artificial superintelligence (ASI) poses catastrophic risks, including the risk of human extinction from misaligned ASI, geopolitical instability, and misuse by malicious actors. This report proposes an international agreement to prevent the premature development of ASI until AI development can proceed without these risks. The agreement halts dangerous AI capabilities advancement while preserving access to current, safe AI applications.
The proposed framework centers on a coalition led by the United States and China that would restrict the scale of AI training and dangerous AI research. Due to the lack of trust between parties, verification is a key part of the agreement. Limits on the scale of AI training are operationalized by FLOP thresholds and verified through the tracking of AI chips and verification of chip use. Dangerous AI research--that which advances toward artificial superintelligence or endangers the agreement’s verifiability--is stopped via legal prohibitions and multifaceted verification.
We believe the proposal would be technically sufficient to forestall the development of ASI if implemented today, but advancements in AI capabilities or development methods could hurt its efficacy. Additionally, there does not yet exist the political will to put such an agreement in place. Despite these challenges, we hope this agreement can provide direction for AI governance research and policy.

BlueRock GPLs the specs and proofs of NOVA

Three. Great. Blog posts. The third one of interest for insight into maintenance and repair of a spec and proof codebase.

NOVA is the legendary hypervisor that was specified and proven correct at BlueRock (FKA Bedrock). I say “legendary” because as a wee lad, stalking Bedrock’s github activity, hearing rumors about C++ verification, it was one of the few Ws of industrial verification at scale that I had heard about.

Look at that B-E-A-YOOT.

A hypervisor is a part of the virtual machine stack. NOVA is a hardened one for critical systems, technically a microhypervisor.

We should teach AIs to write this stuff, cuz that looks painful to type.

We don’t talk enough about separation logic here on the newsletter. Anyways,

People are playing with Aristotle, Harmonic is hiring

$120M series C.

Hardware is an interesting product area! Looks like their business model has advanced past the “mumbling to investors about curryhoward” stage. 2025, the year of mumbling to investors about curryhoward, has come to a roar of a close. I have also mumbled about curryhoward to my dearest yall, which might mean I get bayes points for a math company starting to spin up a program synthesis product. I can’t tell how obvious that sort of claim was, or is, but I know one thing: I love getting points.

If you have Aristotle access, please test FVAPPS and report back. Be sure to append the unit tests, that’s like the hardest part of the benchmark.

If I had a nickel for every benchmark prefixed “Veri-” it’d only be four nickels but it’s still weird that it happened four times

Some of these I had no good reason not to cover earlier. Abstracts styletransferred by Gemini.

Vericoding

We do hereby present and test the largest ledger of trials yet assembled for the craft known as Vericoding—the generation of a code whose certainty is sworn upon by the very stars—from the formal scrolls of specification. This, mind you, is in stark contrast to the common, wicked Vibe Coding, which spews forth a quick but bug-ridden script, born of a mere whisper of natural tongue.
Our grand ledger contains twelve thousand five hundred and four such scrolls of specification, with three thousand and twenty-nine written in the ancient runes of Dafny, two thousand three hundred and thirty-four in the sturdy tongue of Verus/Rust, and seven thousand one hundred and forty-one in the subtle logic of Lean. Of these, a full six thousand one hundred and seventy-four are entirely new, untarnished challenges.
We find that the success rate of this noble Vericoding, when performed by the Sorcerers of Language (our off-the-shelf LLMs), stands at a meager 27% in Lean, rises to 44% in Verus/Rust, and achieves a triumphant 82% in Dafny. Alas, the addition of a common, flowery natural-tongue description does not notably sharpen their success. Furthermore, the light of these Sorcerers has illuminated the pure path of Dafny verification, raising its former success rate from a humble 68% to a glorious 96% over the past twelve moons.

Veribench

The Formal Verification of Software doth stand as a promise most bright—a potential transformation wrought by the Generative Artifice of the Mind (AI). For a Provably Correct Code would utterly banish entire legions of hidden vulnerabilities, staunch the fatal breaches of critical systems, and, perhaps, forever change the practice of software engineering through trustworthy methods of implementation.
To spur this sacred domain, we unveil VeriBench, a trial meticulously crafted for judging the strength of the Sorcerers’ Models in the end-to-end verification of the Code. This task demands the generation of complete Lean 4 incantations—the working functions, the unit tests, the Theorems of Correctness, and the Formal Proofs themselves—all drawn from humble Python reference spells or their accompanying common-tongue docstrings.
Our scrutiny of this one hundred and thirteen-task suite (comprising the tasks of HumanEval, simple drills, classical algorithms, and security snares) reveals a woeful truth: the current Frontier Sorcerers compile but a small fraction of the programs. Claude 3.7 Sonnet achieves compilation on a mere 12.5%, while the mighty LLaMA-70B cannot compel a single program to compile in the Lean 4 HumanEval subset, even after fifty attempts guided by feedback! Yet, observe the noble Self-Optimizing Trace Agent architecture, whose compilation rates approach a magnificent 60%! VeriBench thus lays the unyielding stone for developing systems capable of synthesizing provably correct, bug-free code, thus advancing the journey toward a more secure and dependable digital kingdom.

VerifyThisBench

While the Grand Language Models (LLMs) have shown marvelous cunning in the quick generation of code, many existing trials are now easily conquered, and offer little guarantee of trustworthiness for the generated programs. To gain greater insight into the Sorcerers’ reasoning on matters of Formal Correctness, we present VerifyThisBench, a new, agonizing trial which assesses the end-to-end verification of programs from mere natural-tongue descriptions.
The models must complete a trifecta of chivalric deeds: (i) Extract the Formal Specifications, (ii) Implement the Code in a language that craves verification, and (iii) Construct the Machine-Checkable Proofs.
Our evaluation reveals that even the most vaunted of the modern models, such as o3-mini, achieve a pass rate of less than 4%, with many of their utterances failing to even compile! To divine the true source of this difficulty, we further propose VerifyThisBenchXS, a milder variant where partial implementations or proofs are benevolently supplied. Across nine distinct models and seven tools of verification, we observe a steady gain when refinement is driven by the whispers of feedback, yet the overall pass rates remain pitifully low, underscoring the vast chasms that yet divide the Sorcerers from true formal reasoning. We release this trial and its unified environment to spur on the verification powers of all future models.

VeriEquivBench

Formal Verification stands as the ultimate frontier for ensuring the veracity of the code spawned by the Grand Language Models (LLMs). Methods that co-generate the code and the formal specifications in austere formal languages, such as Dafny, can, in theory, swear upon the truth of their alignment with the user’s intent. Alas, the entire progress is stifled by the difficulty of judging the quality of the specifications themselves.
Current trials rely upon the perilous task of matching the generated work against a ground-truth specification—a manual process requiring deep expertise, which has limited existing datasets to a mere few hundred simple problems, and moreover suffers from a profound lack of reliability.
To remedy this, we introduce VeriEquivBench, a new trial featuring two thousand three hundred and eighty-nine complex algorithmic puzzles designed to expose the frailty of current models in both the generation of code and the deep formal reasoning. Our evaluative framework replaces the perilous ground-truth matching with a formally grounded metric: the Equivalence Score, and rigorously verifies the quality of the generated specifications and code. Our findings declare that the generation of formally verifiable code remains a profound challenge for the state-of-the-art Sorcerers. This underscores both the sheer difficulty of the task and the desperate need for trials like VeriEquivBench to hasten the march toward scalable and trustworthy coding agents.

From Galois’ blog

Specifications don’t exist

Should’ve been in last newsletter but slipped through the cracks.

We need words for the different pessimisms about FMxAI. I often talk about the world-spec gap or the world-spec problem (that formal methods don’t rule out sidechannel attacks). This post is about a different pessimism, the elicitation problem or the elicitation and validation problem. Someone should absolutely be funding an org to focus on elicitation and validation, it’s a turbo important part of the theory of change. Is anyone working on this?

Lean and claude code

Mike also has a technical post about vibecoding in Lean.

Pair it with these off the shelf “skills” (a claude code feature that’s “just prompts with extra steps”).

Rigorous Digital Engineering

What if proof engineering but too cheap to meter?

Oops i missed Logical Intelligence

Should’ve covered these folks a while ago. Yes, it appears their clientele is crypto/defi, but I have a generally positive attitude about life and I don’t want to set my “days since snark incident” counter back to zero, so we will ignore that and focus on the little we can ascertain about their tech and their claims.

There are two parts to this, there’s the part of why/how exactly they believe what they believe about their Lean product, and the part of how their Noa agent (which is not paywalled, you can just install it on github) fits into my strategic worldview.

Primitive screwheads: text-to-text. My boomstick: structural synthesis

Logical Intelligence is not bullish on autoregressive text-to-text as a program synthesis paradigm. Like Leni Aniva, they think tree search (starting with MCTS) will beat LLMs in the fullness of time. The interesting part, with a very paywalled model that I can’t test, is if they’re right why isn’t Harmonic (or Morph or a frontier company or anyone else) scooping them? It’s the same thing I say when I look at HOC: yes, text-to-text is an uncivilized approach to program synthesis, but we haven’t welded structural synthesis with the bitter lesson yet, and I don’t expect to see the gains until we do. If it could be any other way, then we’d be living in the GOFAI Extended Cinematic Universe instead of the Prompts Extended Cinematic Universe. I could write down some loose ideas of things you could try (to achieve the welding), but I will not because I’m unconvinced the d/acc case is actually the majority of the mass. I’m too concerned that Logical Intelligence, HOC, to some extent Leni are right about the superpower unleashed by structure-aware program synthesis and I don’t think we’re ready (as a QA/safety community, nor as a society).

Analyzing codebases for vulnerabilities

From their product page:

Ordering an external audit is both very expensive and very time-consuming. Our AI tool, Noa, delivers regular feedback on your code—minutes for smaller codebases and tens of minutes for larger ones. This lets you get near-real-time insight into the most critical potential security risks at a fraction of the cost. Noa integrates with GitHub: simply add the Noa bot to your repository, and after each pull request you can request a dashboard showing potential risks across the entire repository, along with their likelihood of exploitation and severity ratings.

I have a post coming out about this, but I think the sort of thing they’re trying to do here is an important part of the strategic outlook. Audits, cryptanalysis, cybersecurity consulting are an important area to automate if we’re going to know, with a finite proof synthesis budget, which components are the most critical to harden with proofs. To be clear, I have not used the product, I don’t have any codebases it’s a good fit for. But it’s a class of product I’m excited about, even (ugh) if it is (ew) for defi/crypto.

Announcements from the first round of Mathematics for Safe AI Opportunity Space at ARIA

Spot ole q doc somewhere on this page! Other highlights are the hardware verification team, the GFLowNet/SynthStats team, and the SFBench team.

Scalable synthesis of theorem proving challenges in formal-informal pairs

Apparently there was some twitter discourse about this paper but one of the discoursers was using a hidden profile. It’d be great to be more like a Zvi style newsletter full of twitter screenshots, that would just require me to log onto to twitter more, which like, no.

The Grand Confluence of Lean and the Scholarly Arts of Computation: A Fount of Trials for the Sorcerer’s Mind– The noble art of Formal Theorem Proving (FTP) hath risen as a cornerstone for judging the deep reasoning capabilities of the Grand Language Models (LLMs), enabling the automated verification of mathematical oaths upon a massive scale. Yet, the progress of this quest has been hindered by a scarcity of suitable archives, due to the high toll of manual curation and the lack of truly challenging dilemmas paired with verified correspondences between Formal Scroll and Informal Chronicle. We propose to tap into the wellspring of Theoretical Computer Science (TCS) as a boundless source of rigorous proof problems. Within this scholarly domain, the definitions of algorithms permit the automatic synthesis of an arbitrary number of complex Theorem-Proof pairs. We demonstrate this potent approach upon two realms of TCS: the Busy Beaver problems, which demand the proof of bounds upon a Turing Machine’s cessation of movement, and the Mixed Boolean Arithmetic problems, which entwine the logic of the mind with the rigor of number. Our framework automatically weaves these challenges, providing parallel specifications: the Formal Code (Lean4) and the Informal Narrative (Markdown), thus creating a scalable conduit for generating verified trials of proof. Scrutiny of the frontier models reveals substantial chasms in their automated theorem-proving prowess: while the champion DeepSeekProver-V2-671B achieves a noble 57.5% success rate on the Busy Beaver challenges, its strength wanes, managing only 12% on the Mixed Boolean Arithmetic puzzles. These findings illuminate the great difficulty of crafting long-form proofs, even for those problems whose computational verification is a mere trifle, thus showcasing the invaluable role of TCS realms in advancing the research of automated reasoning.

AI Resilience: cyberphysical systems

Friend of the newsletter Nora Ammann published AI Resilience a little bit ago. The section on cyberphysical systems is relevant to us: it relies on secure (formally verified) program synthesis becoming cheap and accessible. Resilience is a flavor of defensive acceleration that specifically targets the durable and structural resolution of vulnerabilities, vulnerabilities which get amplified by AI but which, if we’re diligent and hardworking, get ameliorated by AI as well.

Let’s formalize this step by step

One time a friend asked me “why not just put the proof synthesis in the reasoning trace and the thing you’re writing the proof about (say, a program) in the final output“. And I was like, “...huh”. And I got as far as adding a few credits to my runpod account before getting pulled into other things. Little did I know, at exactly that moment, this team was hard at work!

A Proposal for Safe Passage: The Formal Verification of the Grand Sorcerers’ Thoughts– The method of the Chain-of-Thought (CoT) prompting hath become the established ritual for coaxing forth the reasoning powers from the Grand Language Models (LLMs). Yet, to contain the hallucinations in these Chains—phantoms notoriously difficult to discern—the current remedial arts, such as the Process Reward Models (PRMs) or the Self-Consistency measures, operate as opaque boxes, offering no verifiable evidence for their judgments, thus perhaps limiting their true efficacy. To redress this failing, we draw inspiration from the ancient wisdom that “the gold standard for supporting a mathematical claim is to provide a proof.” We propose a retrospective, step-aware framework of Formal Verification which we title Safe. Rather than assigning arbitrary scores or marks, we strive to articulate the mathematical claims within the formal mathematical language of Lean 4 at the conclusion of each reasoning step, and further provide formal proofs to definitively identify these hallucinations. We test our framework Safe across various models and mathematical archives, demonstrating a significant enhancement in their performance, while simultaneously offering interpretable and verifiable evidence for their passage. Furthermore, we propose FormalStep as a new trial for the correctness of step-by-step theorem proving, containing 30,809 formal statements. To the best of our knowledge, our work represents the first valiant endeavor to utilize the formal mathematical language of Lean 4 for verifying the natural-tongue content generated by the LLMs, thereby aligning with the very reason these formal languages were created: to provide a robust and unshakeable foundation for the hallucination-prone proofs scribed by human hands.

Ulyssean website mission status: totally sick

There’s honestly no Ulyssean update in this issue, but I stumbled upon their website and loved the graphic design!

July-October 2025 Progress in Guaranteed Safe AI

Quinn Dougherty — Wed, 22 Oct 2025 02:28:46 GMT

Yall, I really do apologize for radio silence. It has mostly to do with breaking my ankle in three places, but I’m walking again.

This edition of the newsletter looks a bit more like movement happenings and announcements, which isn’t to say that there weren’t more papers or technical results I could’ve included, just that my mind wasn’t on them over the summer. I feel like I should be working on strategic clarity right now! Watch this space, etc.

Verilib launch/demo on the 23rd (SOON)

The flagship product of the Beneficial AI Foundation is publicly launching! More on this in next edition of the newsletter.

Theorem blog post

I remain turbo impressed with Theorem’s tech. Formal methods is quality assurance, and as every QA technician learns on the first day of QA School, property based tests are 80% of the value of a proof for 20% of the effort.

Why combine proofs and PBTs? Standard PBTs are compute efficient but prone to gaps: developers miss edge cases, and rare bugs slip into production. Proofs solve this with modular steps that decompose and incorporate reasoning, revealing what cases we’re missing—but they’re computationally impractical.

They’re a little coy with the details. Too coy really, I thought of omitting this from the newsletter but I’ve seen some demos from the team that aren’t public yet that make me bullish on the company, so I want to remind everyone to watch this space.

APE TOGETHER STRONG

No reason this couldn’t have been covered in an earlier episode! I think it slipped through the cracks or I had too many similar papers that month.

I’m pretty interested in the version control problem, in the long game. Every Lean synthesis laborer in the prompt mines every day wiring up their agents, MCP servers, and lean-toolchain files knows that the high code velocity of Lean and Mathlib relative to fewness of tokens in the pretraining data is a major pain point, especially compared to the synthesis successes of larger-resourced languages. What does it look like if language model technology eventually scales to something less hobbled by this?1

To be clear, this question of version control is different from the interest in version control over in Safeguarded/ARIA land that was covered in a previous newsletter. Over there (davidadia? davidadistan?), specifications and world models are a kind of structured data, and they want version control that (among other things) doesn’t just flatten critical specs to plaintext.

Moreover, I don’t think this is necessarily the reason ByteDance got interested in version control for proof synthesis. They may be thinking diff-aware / git history aware approach to Mathlib leads to more synthetic data and a better understanding (in the model) of how to add features, refactor, etc.

Recent progress in large language models (LLMs) has shown promise in formal theorem proving, yet existing benchmarks remain limited to isolated, static proof tasks, failing to capture the iterative, engineering-intensive workflows of real-world formal mathematics libraries. Motivated by analogous advances in software engineering, we introduce the paradigm of Automated Proof Engineering (APE), which aims to automate proof engineering tasks such as feature addition, proof refactoring, and bug fixing using LLMs. To facilitate research in this direction, we present APE-Bench I, the first realistic benchmark built from real-world commit histories of Mathlib4, featuring diverse file-level tasks described in natural language and verified via a hybrid approach combining the Lean compiler and LLM-as-a-Judge. We further develop Eleanstic, a scalable parallel verification infrastructure optimized for proof checking across multiple versions of Mathlib. Empirical results on state-of-the-art LLMs demonstrate strong performance on localized edits but substantial degradation on handling complex proof engineering. This work lays the foundation for developing agentic workflows in proof engineering, with future benchmarks targeting multi-file coordination, project-scale verification, and autonomous agents capable of planning, editing, and repairing formal libraries.

Cobblestone

This was originally posted a whole year ago, but had a revision over the summer. In any case, I had missed it until now.

I basically consider this a neurosymbolic architecture. The hammer (an SMT driven proof search tool) is the symbolic, and the regular LLM parts you know and love are the neuro part. Obviously you can do more than prompt the agent “try using the hammer tactic sometimes if you get stuck, little buddy :) I believe in you”. Their divide-and-conquer is one of those things.

Standard (by now) CoT tricks and agent stuff. It’s further ahead than a couple papers I’ve seen in the Lean agent space that came out since then, so good for them.

Formal verification using proof assistants, such as Coq, is an effective way of improving software quality, but requires significant effort and expertise. Machine learning can automatically synthesize proofs, but such tools are able to prove only a fraction of desired software properties. We introduce Cobblestone, a divide-and-conquer approach for proof synthesis. Cobblestone uses a large language model (LLM) to generate potential proofs, uses those proofs to break the problem into simpler parts, automatically identifies which of those parts were successfully proven, and iterates on the remaining parts to build a correct proof that is guaranteed to be sound, despite the reliance on unsound LLMs. We evaluate Cobblestone on four benchmarks of open-source Coq projects, controlling for training data leakage. Fully automatically, Cobblestone outperforms state-of-the-art non-LLM tools, and proves many theorems that other LLM-based tools cannot, and on many benchmarks, outperforms them. Each Cobblestone run costs only $1.25 and takes 14.7 minutes, on average. Cobblestone can also be used with external input, from a user or another tool, providing a proof structure or relevant lemmas. Evaluated with such an oracle, Cobblestone proves up to 58% of theorems. Overall, our research shows that tools can make use of partial progress and external input to more effectively automate formal verification.

More math startups, but I’m not gonna single anyone out

As always, I’m out in the applebees parking lot informing everyone that the VC influx into formal methods (especially Lean) doesn’t turn into AI security gains by default. We need you and I to make sure that formal verification agents are directed at problems in real world AI security and safety. Some or perhaps most math companies are doing PR in 2025 so they can create software QA products in 2026. I talked to one, though, who is adamant they are not. They just want to solve math problems and aren’t terribly fussed with capturing more than 1% of the upside. I think these guys are the outlier: most can be expected to pivot to program synthesis in 2026. I’m happy to discuss operationalizations of this to take bets. See also this brief memo I just jotted down 2.

I feel like if you have friends who want to keep their ear to the ground about this math automation stuff you should send this to them, that way I can corrupt them into AI security/safety slowly over time.

SMBC recently hit the nail on the nose. It feels in the direction of if I had ranted at Zach Weinersmith himself, but he went way harder and took it way further.

I know a bunch of founders who are aiming to increase headcount soon

And while they have not literally posted their job ads yet, reach out and I can let you know if I think they’d like to talk to you.

I’d love to do some napkin math about the VC influx in FMxAI over the last year, and include projections for how it might grow in the next couple years. I’d also love for a reader to do it and I can just link to it here!

Clark Barrett’s appearance on Orpheus’ seminar series is finally scheduled

Separately, Orpheus has a Manifund post up to support the seminars and his other GSAI movement building efforts. Early endorsements by both of the FVAPPS coauthors.

TA2 is dead over at ARIA

I think we’ve finally had a newsletter published while ARIA does not have an active Safeguarded AI funding call. Achievement unlocked! The original TA2 plan to fund one team to the tune of 8 figures was canceled in the 9th inning.

When we designed this programme, the world looked different. Today, the pace of progress in frontier AI models has fundamentally altered the path to our goals. We now expect that the intended technical objectives of TA2 will be attainable as a side effect of this progress, without requiring a dedicated R&D organisation. Instead of investing in creating specialised AI systems that can use our tools, it will be more catalytic to broaden the scope and power of the TA1 toolkit itself, making it a foundational component for the next generation of AI.

TA2, as you’ll recall, was to be a large investment in founding a new org that would be kinda frontier-lab like in some ways but it would specialize in being superpowered at the new specification - world model - proof cert language/stack that TA1 is shipping. I think it’s roughly true that advanced capability in this new language/stack can be accomplished as a side effect of what claude, gpt, grok, and gemini are going to do anyway. But I’m also surprised that the extent to which that is forecasted to be true now, by davidad and his team, wasn’t priced in back when the Plan™ was originally drawn up. Davidad just seems a little more bitter lesson / scaling pilled than being surprised enough by capabilities alone to pivot! I guess I have a mild impulse to speculate that the team got spooked by the difficulty of aligning incentives for this kind of org to get a 7 or 8 figure injection at its founding, but again I’d ultimately expect that to have been priced in when the Plan™ was originally published.

I’ve heard of at least one turbo sick venture that was germinated by the prospect of pursuing this grant, and I don’t think they’re giving up just yet. Watch this space.

Lisa from SL5 Taskforce still up at Manifund

Lisa from the SL5 Taskforce has a few more months to disperse up to $100k. SL5 Taskforce is interested in some applications of FMxAI, especially to cloud infrastructure hardening. The center of the venn diagram (between Progress in GSAI readers and the interests of the SL5 Taskforce) is not massive, but I think it’s big enough for us to discuss here!

Apply here, I think you also have to file a Manifund project as well.

Cool NASA slidedeck from the summer about formal methods at Langley

I think the “future” section on cyberphysical systems and comparing AI to cyberphysical systems is good. The AI4FV section is all the stuff we talk about, and FV4AI sections are all the stuff I’m constantly saying we’re not talking about. The highlights are the earlier parts, the historical context, IMO.

Rebranding the newsletter

Yall, I’m interested in your thoughts here. My priorities are shaped a lot by “ensure AI security knows what to ask the formal methods community for at crunch time” and the converse “ensure that formal methodsititians view the AI security community as their most important customer”. I think GSAI is not completely wrong, for the title of the newsletter! But I think it could be even less wrong. The main problems I see with “guarantee” is that it doesn’t evoke swiss cheese, and I think formal methods are a source of swiss cheese!

I also have to disambiguate AI security as I mean it (boxing) from what e.g. Palisade means (offensive infosec capabilities). Watch this space etc.

Also if you want to help me with the newsletter we could be a team! I don’t think I could get back up to once per month without some friends to help.

Caveat: this problem was way worse 1-2 years ago. I think as the Lean3->Lean4 migration cooled down, velocity as apparent in pretraining data got lower, plus language models getting much more powerful. Overall I’m curious if this velocity/version control problem will remain persistent, not turbo confident it will.

I think this shortform is pretty important, I do regret being unnuanced about curryhoward in the past, separately my worldview has gotten more nuanced and changed in other ways. Or I think back when I said “something something curryhoward” more, being directionally correct was enough, but now that more people are saying it, it’s time to be not just directionally correct but precise. You get the idea.

May-June 2025 Progress in Guaranteed Safe AI

Quinn Dougherty — Thu, 26 Jun 2025 21:20:08 GMT

There will be a AIxFM conference in the Bay Area Q4, according to a little birdie.

Morph ships big autoformalization result in 3599 lines of Lean

They have human decomposition in the latex/lean blueprint, into 67 lemmas with human spotchecking. Still, I’m impressed with their system (called Trinity).

I’d like to know how expensive (in tokens, or some other compute metric) it was to do this!

On Verified Superintelligence

I of course have opinions on their blog post Verified Superintelligence.

Today's most advanced AI systems—reasoning LLMs trained with supervised RL—have hit a fundamental wall. They can only improve on problems where we can verify the (known) answers. Every math problem needs a known solution. Every coding challenge requires test cases. Every reasoning task demands ground truth.

I think I know where they’re going with this: “verification” in the sense of supervised RL as a cognate with “verification” in the sense of typechecking.

Perhaps most importantly, this approach offers a path to what we call "trustless alignment." By ensuring all outputs can be independently validated, we create AI systems whose safety doesn't depend on faith in their training process or architecture. The proof is in the output.

This describes effectively a boxing strategy, not alignment. Boxing is when you police the boundary between the AI and the world. Alignment is when you don’t need to. Is this an important distinction, or a narcissism of small differences you’d only notice if you have Lesswrong Disease (the disease they diagnose you with when you read or write on Lesswrong)? Being afflicted myself, I’m not sure how I would tell.

Two benchmarks

VERINA

Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation -- jointly generating code, specifications, and proofs of code-specification alignment -- offers a promising path to address this limitation and further unleash LLMs' benefits in coding. Yet, there exists a significant gap in evaluation: current benchmarks often lack support for end-to-end verifiable code generation. In this paper, we introduce Verina (Verifiable Code Generation Arena), a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions. Verina consists of 189 manually curated coding tasks in Lean, with detailed problem descriptions, reference implementations, formal specifications, and extensive test suites. Our extensive evaluation of state-of-the-art LLMs reveals significant challenges in verifiable code generation, especially in proof generation, underscoring the need for improving LLM-based theorem provers in verification domains. The best model, OpenAI o4-mini, generates only 61.4% correct code, 51.0% sound and complete specifications, and 3.6% successful proofs, with one trial per task. We hope Verina will catalyze progress in verifiable code generation by providing a rigorous and comprehensive benchmark. We release our dataset on this https URL and our evaluation code on this https URL.

CLEVER

We introduce CLEVER, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a specification that matches a held-out ground-truth specification, and (2) the task of generating a Lean implementation that provably satisfies this specification. Unlike prior benchmarks, CLEVER avoids test-case supervision, LLM-generated annotations, and specifications that leak implementation logic or allow vacuous solutions. All outputs are verified post-hoc using Lean's type checker to ensure machine-checkable correctness. We use CLEVER to evaluate several few-shot and agentic approaches based on state-of-the-art language models. These methods all struggle to achieve full verification, establishing it as a challenging frontier benchmark for program synthesis and formal reasoning. Our benchmark can be found on GitHub (this https URL) as well as HuggingFace (this https URL). All our evaluation code is also available online (this https URL).

Welcome to the summer of benchmarks. VERINA and CLEVER each critically cite a paper we saw in a previous newsletter to patch some of its issues, especially around completeness of specification synthesis.

The Theorem launch

Friends of the newsletter Jason and Rajashree launched Theorem (YC W25)

MISSION: Theorem is an AI and programming languages research lab. Our question: as the models get better at generating code, what is the bottleneck to safely deploying vastly more computation in the world?
Code has bugs. AI-enabled attackers will increase the volume and velocity of attacks on software infrastructure. Human code review will not scale.
AIs are untrusted generators. In order to deploy AIs in critical systems without fully understanding their generalization behavior, we’ll need robust, scalable methods for overseeing their solutions.

I’ve seen some of the tech and it’s fantastic; when I think about projects I could do, I try only to do things that Theorem wouldn’t be able to do better.

Proof Carrying Code Completions

Would’ve been great to have covered this paper when it came out, but it’s relevant enough to discuss it even though I’m late.

Code completions produced by today’s large language models (LLMs) offer no formal guarantees. We propose proof-carrying code completions (𝑃𝐶^3). In this paradigm, a high-resourced entity (the LLM provided by the server) must provide a code completion together with a proof of a chosen safety property which can be independently checked by a low-resourced entity (the user). In order to provide safety proofs without requiring the user to write specifications in formal logic, we statically generate preconditions for all dangerous function calls (i.e., functions that may violate the safety property) which must be proved by the LLM.
To demonstrate the main ideas, we provide a prototype implementation in the program verification language Dafny, and a case study focusing on file system vulnerabilities. Unlike Python code generated by GPT-4, Dafny code generated by 𝑃𝐶^3 provably avoids a common weakness related to path traversal (CWE-35), using a single generation attempt (𝑘 = 1) and a modest number of tokens (3, 350). Our tool is available as an open source repository at https://github.com/DavisPL/PCCC.

The setup in this paper will be familiar to people who’ve been thinking about guaranteed safe AI for a bit. We have trusted and untrusted components, where a core trusted component is a proof checker of some kind.

Proof carrying code is conceptually similar to a sigma type or dependent pair. It just means you tag or decorate whatever code you deliver with a proof of its correctness.

Law Zero launch

Yoshua Bengio’s new org, pursuing the Scientist AI agenda. Scientist AI is to an extent in the same conceptual family as Safeguarded AI. Both are heavily involved in probability theory, where Scientist AI has even more information theory. Both are trying to avoid informally specified agents running amok, both by locking AI behind some probabilistic assurances. Safeguarded AI, however, emphasizes a robust notion of certificate, while Scientist AI doesn’t emphasize this. People have criticized Scientist AI for falling into the pitfalls of “oracles” that Bostrom et al discussed over a decade ago. Specifically, agency can’t be blocked without banning loops and subprocess.run, since tool use can be easily recovered in this day and age.

Law Zero is hiring for sr roles in Montreal.

AI Security Forum in Tel Aviv, DC, and Vegas this summer

Following the success of Defcon satellite events and the Paris AI Action Summit satellite event, the security forum hits Tel Aviv and DC for the first time before returning to Vegas.

Foresight RFP alludes to some GSAI topics

AI tools that assist with increasing the tractability of formal verification for real-world systems, starting at security-critical infrastructure.
Designing or proving properties of secure system architectures.

They accept applications rolling year round, but evaluate and disperse on a quarterly basis. The next “deadline” is June 30th. Sorry I didn’t tell you sooner, I’m bad at my job.

Safeguarded AI Technical Area 2

Time to fire up a machine learning lab to automate all the proof engineering in the new proof stack that TA1 cooks up. Deadline October 1st.

March-April 2025 Progress in Guaranteed Safe AI

Quinn Dougherty — Sun, 20 Apr 2025 18:53:40 GMT

Say hi at ICSE in Ottawa, I’ll be at the reception Thursday, this colocated event on Friday, and the LLM4Code workshop on Saturday.

As usual there are no benefits to the paid subscription.

Sorry for consolidating two months into one post again after I said I wouldn’t.

Subscribe now

Fermstimate of the cost of patching all security relevant open source software

Niplav writes

So, a proposal: Whenever someone claims that LLMs will d/acc us out of AI takeover by fixing our infrastructure, they will also have to specify who will pay the costs of setting up this project and running it.

I’m almost centrally the guy claiming LLMs will d/acc us out of AI takeover by fixing infrastructure, technically I’m usually hedging more than that but it’s accurate in spirit.

If transformative AI is developed soon, most open source projects (especially old ones relevant to internet infrastructure) are going to be maintained by humans with human response times. That will significantly increase the time for relevant security patches to be reviewed and merged into existing codebases, especially if at the time attackers will submit AI-generated or co-developed subtle exploits using AI systems six to nine months behind the leading capabilities, keeping maintainers especially vigilant.

I usually say we prove the patches correct! But Niplav is correct: it’s a hard social problem, many critical systems maintainers are particularly slop-phobic and won’t want synthetic code checked in. That’s why I try to emphasize that the two trust points are the spec and the checker, and the rest is relinquished to a shoggoth. That’s the vision anyway– we solve this social problem by involving the slop-phobic maintainers in writing the spec and conveying to them how trustworthy the deductive process is.

Niplav’s squiggle model:

Median $~1b worth of tokens, plus all the “setting up the project, paying human supervisors and reviewers, costs for testing infrastructure & compute, finding complicated vulnerabilities that arise from the interaction of different programs…” etc costs. I think a lot’s in our action space to reduce those latter costs, but the token cost imposes a firm lower bound.

But this is an EA Forum post, meaning the project is being evaluated as an EA cause area: is it cost effective? To be cost effective, the savings from alleviating some disvalue have to be worth the money you’ll spend. As a programming best practices chauvinist, one of my pastimes is picking on CrowdStrike, so let’s not pass up the opportunity. The 2024 outage is estimated to have cost about $5b across the top 500 companies excluding microsoft. A public goods project may not have been able to avert CrowdStrike, but it’s instructive for getting a flavor of the damage, and this number suggests it could be easily worth spending around Niplav’s estimate. On cost effectiveness though, even I (who works on this “LLMs driving Hot FV Summer” thing full time) am skeptical, only because open source software is pretty hardened already. Curl/libcurl saw 23 CVEs in 2023 and 18 in 2024, which it’d be nice to prevent but really isn’t a catastrophic amount. Other projects are similar. I think a lot about the Tony Hoare quote “It has turned out that the world just does not suffer significantly from the kind of problem that our research was originally intended to solve.” Not every bug is even an exploit.

Safeguarded AI

Creator profiles for TA1.1 and TA3

The TA1.1 ones have been up for a while and I think I missed them, but TA3 was posted fairly recently.

TA3 is the applications domain. Yaknow, the real world, that thing from all those functional programming jokes.

This work aims to address the challenge of AI deployment in safety-critical sectors by combining scientific world models and mathematical proofs to develop robust safety assurances.
Researchers from diverse companies and organisations such as AstraZeneca, Mind Foundry, the University of Oxford, the University of Birmingham, and the University of York, will aim to demonstrate how safeguarded AI systems can help them to make significant improvements in vital sectors.

TA2 call

This isn’t really open to individuals, but as always reading the funding call gets you some insight into ARIA’s worldview and strategy

ARIA is launching a multi-phased solicitation for Technical Area 2 (TA2) to support the development of a general-purpose Safeguarded AI workflow. The programme aims to demonstrate that frontier AI techniques can be harnessed to create AI systems with verifiable safety guarantees. In TA2, we will award £18m to a non-profit entity to develop critical machine learning capabilities, requiring strong organizational governance and security standards. Phase 1, backed by £1M, will fund up to 5 teams to spend 3.5 months to develop full Phase 2 proposals. Phase 2 — which will open on 25 June 2025 —will fund a single group, for £18M, to deliver the research agenda. TA2 will explore leveraging securely-boxed AI to train autonomous control systems that can be verified against mathematical models, improving performance and robustness. The workflow will involve forking and fine-tuning mainstream pre-trained frontier AI models to create verifiably safeguarded AI solutions.

I’ve heard this described as something like a “new frontier lab”, which is always music to my ears, but in truth I’m only worried about shops that push the envelope on scaling (chinchilla or otherwise). Boromir wanted to use the One Ring to defend Gondor, and we largely trust Tolkien/whomever when they say that defensive acceleration is folly and the ~~Ring’s corrupting influence~~ incentives are stronger than the will of man. Nevertheless, I feel pretty good about this because it’s narrow and targeted. Eager to find out if I’ll eat my words.

Conferences, events

High Confidence Software and Systems Conference May 12, Maryland US
CAV July 21, Croatia
Formal methods for Rust May 2nd, Ottawa CA

New control paper

GSAI and Control are really complimentary forces so I think I’ll mention news in Control, at least sometimes

Control evaluations measure whether monitoring and security protocols for AI systems prevent intentionally subversive AI models from causing harm. Our work presents the first control evaluation performed in an agent environment. We construct BashBench, a dataset of 257 challenging multi-step system administration tasks, and evaluate whether various safety measures can prevent an adversarially constructed AI agent from covertly downloading and executing malicious code in this environment. This multi-step setting introduces new attack and defense dynamics, which we investigate in order to design novel control protocols that prevent safety failures without hindering the ability of non-malicious agents to perform useful work. We introduce a class of control protocols called resample protocols that dynamically take additional samples of certain actions. We find these protocols significantly improve on existing techniques by selectively blocking the AI agent from executing suspicious code and incriminating the agent by generating additional examples of dangerous behavior. We measure the tradeoff between attack prevention and usefulness; our best protocol combines resampling with analysis of previous steps, reducing the success rate of attacks from 58% to 7% at a 5% cost to the performance of a non-malicious agent.

January-February 2025 Progress in Guaranteed Safe AI

Quinn Dougherty — Fri, 28 Feb 2025 03:04:13 GMT

Ok this one got too big, I’m done grouping two months together after this.

BAIF wants to do user interviews to prospect formal verification acceleration projects, reach out if you’re shipping proofs but have pain points!

This edition has a lot of my takes, so I should warn you that GSAI is a pretty diverse field and I would expect all my colleagues in it to have substantial disagreement with at least one take in each issue of the newsletter. Prompt the language model to insert the standard boilerplate disclaimer that views are my own, not those of my colleagues, my employer nor my mom.

If you're just joining us, background on GSAI here.

Formal Verification is Overrated (lightning talk)

Zac makes three sound points which are not cruxes for me. Part of what Nora’s post is doing is responding to this with her take, which differs from mine.

Point one: weights are intractable

Formal verification of the learned component itself is gonna be a nonstarter on nontoy problems for computational cost reasons. We saw last year that a “proof” of the correctness of a function a transformer implements is subcubic in d_vocab, at least a little worse than quadratic. I agree with Zac, which is why I avoid whitebox strategies in my day to day life. GSAI (to me) is a set of blackbox tactics, closer to control, that doesn’t depend on interpretability going particularly well. The upcoming Formalise track of ICSE separates “AI4FV” and “FV4AI” as two separate topic areas, which is obviously the right thing to do and I find half of the time when I introduce GSAI to someone this is the first point of clarification. Speaking for myself, GSAI is AI4FV without being FV4AI.

Point two: reality is complicated and unknown

This capitulates a point I’ve been making for years, which is that the world-spec gap hurts you more than the spec-implementation or spec-component gap. I think if we get arbitrary wins on GSAI agendas, but an arbitrarily unboxing superintelligence too quickly, this principle is what kills us. This principle (which needs a good name, btw) still allows you to stack your swiss cheese tower to the sky, and never says that your swiss cheese tower buys you nothing. A potential crux for me might be that if we leave our immediate successors with a tower of swiss cheese, they’ll fail to use that to escape the acute risk period for predictable reasons.

Agree that the DNA synthesis part of Tegmark and Omuhundro was a bit unhinged. Agree that insofar as Zac is responding to that paper, much “calming down” is needed cuz that paper (unlike, I think, the Toward GSAI paper) is really overstated.

Formal verification, as we know it, exists to isolate error to the spec-world gap, which realistically isolates attack surfaces to a fraction of that. This doesn’t happen for free by default, it requires defensive acceleration. My job is nines, I don’t believe in 100%. However, a significant part of Safeguarded AI is aimed at making the spec-world gap smaller, so it wouldn’t be accurate to say that GSAI stops at the spec-world gap. I think Safeguarded AI’s approach to this, which attacks the details of probabilistic world models and how they’re elicited and falsified, is more reasonable than Omohundro’s “formal specifications of physical systems” approach which is a little too bottom-up for my tastes.

I did talk to one security researcher who, after regaling me with fun tales of side channel attacks, said Omohundro’s vision of eliminating sidechannel attacks by formal verification because the model of the component goes down to physical laws like Maxwell’s equations is not as unreasonably intractable as I thought and is instead merely ambitious. So maybe I should learn basic physics.

Point three: tool AI is unstable and uncompetitive

Zac:

The final category of proposals I see under this formal verification approach is that we should use AI to develop tools and software which is formally verified. And I am broadly a fan of this, but it’s not actually a substitute for working on AI safety.

Also Zac:

The addition of a trivial for loop is the only difference between a tool and an agent (in a general sense).

I think Zac is referring to the ask to restrain AI to tool level and simply not build agents, coming mostly from Tegmark. (See FLI’s tiers. it’s possible that scaffolding more than a loop is all you need to get from tier 2 to tier 3). I just don’t think this ask is centrally related to formal verification, and is not a crux for most of the GSAI space (indeed, much of GSAI wants to leverage primitive agents such as loops). But I do agree with Zac, I think the no-agency ship has sailed and there’s no satisfying way to restrict AI to the tool level, most for competitiveness reasons.

Overrated by whom?

But also, I’d like to ask Zac how it’s “overrated” when the reception from funders (besides ARIA and FLI’s vitalikbucks) is not even lukewarm. OpenPhil is super unplugged from GSAI and not interested, SFF isn’t participating at all in spite of having some of the big position paper authors involved in the S-process. Frontier labs do a bunch of proof scaling projects, but they’re doing it just for fun / to show off / to innovate on the product, not because of any vision in particular (as far as I can tell / as far as I’ve heard). I think ARIA and the FLI Extended Cinematic Universe (i.e. BAIF) is a great place to start, we could spend more but we shouldn’t complain. Does Zac mean the current level of funding is already too high, or is he just worried about that number increasing? He does think it’s an important part of the portfolio, so I think he’s just responding to some of the over-promising (which I tend to agree with him about).

FVAPPS

4715 function signatures each with 2-5 sorry’s out theorems: a new benchmark for leetcode-style programming in Lean. Notice that I, Quinn, the newsletter author, is one of the authors.

We introduce the Formally Verified Automated Programming Progress Standards, or FVAPPS, a benchmark of 4715 samples for writing programs and proving their correctness, the largest formal verification benchmark, including 1083 curated and quality controlled samples. Previously, APPS provided a benchmark and dataset for programming puzzles to be completed in Python and checked against unit tests, of the kind seen in technical assessments in the software engineering industry. Building upon recent approaches for benchmarks in interactive theorem proving, we generalize the unit tests to Lean 4 theorems given without proof (i.e., using Lean’s “sorry” keyword).

Workshop reviewers said that the pipeline we used to generate the benchmark was also a research contribution, even though it’s the minimum viable scaffold: a loop in serial.

Say hi In Ottawa on ICSE workshop day! Special shoutout to Gas Station Manager for citing us like a week after we went public in their hallucination work.

Is this a safety paper?

I'm working on making sure we get high quality critical systems software out of early AGI. Hardened infrastructure buys us a lot in the slightly crazy story of "self-exfiltrated model attacks the power grid", but buys us even more in less crazy stories about all the software modules adjacent to AGI having vulnerabilities rapidly patched at crunchtime.

While I’m not super interested in measurement, I’m excited about this line of work as a synthetic data pipeline. It may take galaxy brained prompting and galaxy brained scaffolds to ship proofs now (if they’re elicitable at all), but by solving FVAPPS you generate synthetic data which you can put to use in finetune jobs, leading to models way better at proving, hopefully with minimal scaffolding and prompting skills. We shipped our paper to arxiv twitter the same week Goedel-LM shipped, and what they do is a kind of “finetune job in-the-loop” along these lines (i.e. using Lean as a ground truth signal).

R1 happened

With finetuning this cheap, lots of opportunities are afoot. That’s it, that’s the post. Especially of interest is that API terms of service aren’t acceptable for some critical systems vendors for compliance reasons, so running a top performing model on compute you control opens up many possibilities for them in particular.

News in automated mathematics

AlphaGeometry finally dropped a paper

For a while Google DeepMind’s results in mathematics just had a vague blog post. It’s great there are finally details in a paper, but I haven’t read it yet.

New SOTA on MiniF2F

https://goedel-lm.github.io/

The topological debate framework

I had a chance to sit down with the author at EAG. This is an exciting theory of the incentives for AIs to have good world models. These are the kind of protocols that could help us close the spec-world gap, which is going to be critical if GSAI does anything at all against arbitrary superintelligences.

Let's say that you're working on a new airplane and someone hands you a potential design. The wings look flimsy to you and you're concerned that they might snap off in flight. You want to know whether the wings will hold up before you spend money building a prototype. You have access to some 3D mechanical modeling software that you trust. This software can simulate the whole airplane at any positive resolution, whether it be 1 meter or 1 centimeter or 1 nanometer.
Ideally you would like to run the simulation at a resolution of 0 meters. Unfortunately that's not possible. What can you do instead? Well, you can note that all sufficiently small resolutions should result in the same conclusion. If they didn't then the whole idea of the simulations approximating reality would break down. You declare that if all sufficiently small resolutions show the wings snapping then the real wings will snap and if all sufficiently small resolutions show the wings to be safe then the real wings will be safe.
How small is "sufficiently small?" A priori you don't know. You could pick a size that feels sufficient, run a few tests to make sure the answer seems reasonable, and be done. Alternatively, you could use the two computationally unbounded AI agents with known utility functions that you have access to.

Debate is a little bit of a misnomer, but close enough.

https://www.lesswrong.com/posts/jCeRXgog38zRCci4K/topological-debate-framework

Atlas hiring dev for spec engineering GUI product

If we can kick the proof down to the AI and rely on the typechecker, we still want a human in the loop for writing the specification

We are developing an AI-assisted IDE for formal specification.
Autoformalization is approaching usefulness but there remains a gap for how humans establish confidence in autoformalized specs and discover issues. We're researching tools and methods to enable engineers to close this gap in real-world assurance applications. We have been prototyping and user-testing a Spec IDE and are looking to add a team member to take over primary responsibility for machine learning engineering.
This job involves developing a machine learning pipeline that powers mechanized spec engineering and review. ML plays multiple roles in the spec IDE: (1) aiding user understanding and navigation by labeling concepts within and across both mechanized and pen-and-paper spec documents, (2) detecting possible issues in the mechanization, and (3) powering a conversational assistant for users to navigate or edit specs.
Autoformalization is not itself the focus of this project, but there's an opportunity to explore autoformalization if desired. This job involves collaborating with a small remote team that brings a user-centered, product discovery mindset to this research; as such, this job also involves learning from user research, and has opportunities to run user studies if desired.
We're looking for a 20 hrs/wk contract through approximately September, but there's room for flexibility. Please let us know your contracting rate ($/hr) if you have a standard rate, else reach out to chat with us; cost will be one of our selection factors.
Contact join@atlascomputing.org with interest.

They want someone with 2 years of surface area on proof assistants like Lean and Coq and experience with ML R&D tooling.

Safeguarded AI TA1.2 and TA1.3 funding call

Video, PDF. Davidad is back with a new funding call.

Technical Area 1.1 is underway and consists of category theorists and probabilistic semantics experts designing a new stack for world models, proof certificates, and specifications. It’s the conceptual problems at the foundation of Safeguarded AI. Upon successful solutions to those problems, it becomes a fairly normal software project: a backend and a frontend.

TA1.2 is a backend. This involves structure-aware version control (previous edition of the newsletter) for eliciting world models and specs, other data structures topics for keeping track of probabilistic proof certs, and an appropriate theory of databases.

TA1.3 is a frontend. Many important stress points of Safeguarded AI require human in the loop, so we want a killer user experience. Imagine being a power plant domain expert in the near future, when AIs can do anything, but they need to elicit specs and world models from you. Hopefully it’s not too painful a process!

Both projects are investing 12 months into requirements engineering with minimal prototyping, and the new org that TA2 is planning to incubate around the end of this year will be partially shaped by those requirements.

I think the microeconomics of this ambition level could be reasoned about (scrutinizing the reference class of software projects roughly this big, counting up roughly how much money was spent on them, fermstimating if this is realistic, etc.). But it’s out of scope for this newsletter.

The deadline is April 8th. It’s secretly the 9th but it’s a good habit to subtract one day when you put it in your calendar, especially since it doesn't close on an anywhere-on-earth midnight.

November-December 2024 Progress in Guaranteed Safe AI

Quinn Dougherty — Wed, 22 Jan 2025 01:12:31 GMT

Sorry for the radio silence last month. It was slow and I didn’t come across things I wanted to write about, to be expected with holidays coming up.

There are no benefits of paying, except you get a cut of my hard earned shapley points, and apparently some disappointment when I miss a month.

If you're just joining us, background on GSAI here.

There should be some DeepSeek/r1 coverage in the next newsletter (I think tanking the cost of finetuning is, while dual use, good news for us in GSAI). I’d rather get this one out the door though.

FrontierMath (hiring)

As I’m always banging on about, progress in math automation is a leading indicator for progress in software engineering automation by the curry-howard correspondence. That’s a little cheeky, but last time I said it in meatspace I got solemn nods as if it wasn’t even cheeky at all. Or maybe they were solemnly nodding at something else. To be explicit, mathematical reasoning abilities and software engineering skills (including formal proof engineering) appear to be at similar capability levels. Furthermore, through platforms like Lean, mathematical ability can enhance software development capabilities. This connection should make it easier to implement rigorous quality assurance processes, such as obtaining formal proof certificates, which are typically costly.

We introduce FrontierMath, a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians. The questions cover most major branches of modern mathematics -- from computationally intensive problems in number theory and real analysis to abstract questions in algebraic geometry and category theory. Solving a typical problem requires multiple hours of effort from a researcher in the relevant branch of mathematics, and for the upper end questions, multiple days. FrontierMath uses new, unpublished problems and automated verification to reliably evaluate models while minimizing risk of data contamination. Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community. As AI systems advance toward expert-level mathematical abilities, FrontierMath offers a rigorous testbed that quantifies their progress.

The story here of course is that by the time the newsletter covered it, Open AI hit 25%. “Mathematicians assemble a benchmark they can barely solve themselves at SOTA 2%, forecasters think it’ll be unsaturated for multiple years, a month later Open AI hits 25%” is a fun little anecdote for Christmas with the relatives, for a funny sense of fun. But, points out an astute reader, since we have typical Open AI science communication (read: extremely disappointing science communication) here: what sort of inference compute was thrown at it? At what cost? A model that can win with the most naive prompt is more impressive than a model that only wins with gal-brained and/or expensive (in tokens) feedback schemes.

So anyways, the story here of course is that after I drafted the previous paragraph, people noticed that Epoch sneakily added their funding disclosure to a new draft of the paper a month after all the hoopla had started to die down. Spoiler alert, the funder was, lol, Open AI. I’ll refer you to LessWrong user 7vik’s summary of events, and not get too far into it.

But I have feelings. What if the externality of mathematical automation and tooling in the form of progress in guaranteed safe AI doesn’t pan out? Then Open AI gets it’s “put terry tao out of a job” boyscout badge for all the employee laptops and the safety community gets nothing in return! Come on guys, why is this even in Epoch’s mandate to begin with? Makes me sad.

Opportunity

FrontierMath begins in three tiers of difficulty (25% undergrad or IMO level, 50% grad student, 25% early professor-track researcher), but they’re currently expanding to a fourth tier of even harder problems, and they also want some formalization in Lean, which you email elliot at epochai.org about if you’d like to partake.

ARIA call: Mathematics for Safe AI (funding)

This round is affiliated with the Safeguarded AI programme, but is outside of the standard technical areas breakdown.

Structure-aware version control via observational bridge types (hiring)

David Jaz Myers writes for the topos blog about structure-aware version control.

This is really exciting, because git diffs as we know them would lead to problems if you were scaling world models (in a davidad-style approach) or specifications (in any GSAI approach including davidad) that were being collaborated on with many humans and AIs.

But suppose that you and I are editing a csv, and you add a row to the bottom and I add a column to the end. Git would see your change as a single line diff, whereas my change (adding a column) is a change to every line; these conflict in the new line you added. But from what we know about the structure of csv, your change should really be a one-row change, mine a one-column change, and the conflict should only occur in one cell.

The author proposes something called observational bridge types (from the up and coming proof assistant Narya) to form the foundation of structure-aware version control. Using these, we can say that, for filetype F and files f1 f2 : F, a diffing algorithm is none other than any inhabitant of the type Diff F f1 f2 (where Diff : (A : Type) -> A -> A -> Type is created using something called logical relations, or an inductive definition depending on a type). Then, conflicts are defined as a pair (d1, d2) : Diff F f1 f2 x Diff F f1 f3 so that merges may be defined as another pair (d3, d4) : Diff F f2 f4 x Diff F f3 f4. That much is roughly consistent with git, provided that you assume F (the filetype) is always “list[list[char]]” or some notion of raw text. It’d be great to not have to assume that, even just for the csv example, to say nothing of more complicated structures in probabilistic semantics or complex/chaotic systems. A system that can also reason about diffs between the types themselves (since Diff Type sigma tau is perfectly reasonable here) is also desirable.

Opportunity

To work with David Jaz on this for the world modeling part of Safeguarded AI in Oxford, apply here.

Squiggle AI

In the davidad and somewhat Bengio regions of the GSAI space, there’s a huge emphasis on world models– computational descriptions of what is. Squiggle is a programming language where the terms are distributions, and all monte carlo boilerplate is hidden from the user. This makes Squiggle a candidate “standard” for belief specifications, as I’ve written about elsewhere.

The first problem you’d run into is that the Users Will Not Just. The users will not just learn a programming language to specify beliefs when they barely want to specify their beliefs in the first place. That’s why it’s good news QURI shipped an AI product, so the interface is closer to natural language while the substrate/backend is squiggle. The LLM Will Just. The LLM will just learn to program in squiggle, so you don’t have to.

It’s ready for you to use it at squigglehub.org/ai (if authenticated), and you can read QURI’s press release on the EA Forum and elsewhere.

October 2024 Progress in Guaranteed Safe AI

Quinn Dougherty — Mon, 28 Oct 2024 23:16:33 GMT

There are no benefits of subscribing, except you get a cut of my hard earned shapley points.

Subscribe now

As always, do reach out to inform me how the newsletter could be better, unless you want less biased and wrong quick takes. The biased and wrong quick takes are mandatory. Do you like inlining abstracts, or should I just expect people to click when they want to read the abstract? This month we’re inlining all of them.

If you're just joining us, background on GSAI here.

I attended an ARIA meeting this month, so most of the newsletter is papers I found there that happen to be recent-ish.

A little birdie told me there’s a meeting in Berkeley at the beginning of December– reach out for more info!

Big list of openreview links for ICLR 2025 papers related to scaled interactive theorem proving

From the Lean Zulip. There’s probably more from ICLR I’d like to cover in the newsletter, but I haven’t parsed the openreview firehose personally. I’m freeloading off Lean Zulip users.

One thing that jumped out at me is in a thread about a polarizing/unpopular paper, the discussion of what programming practices would lead to ML papers having an easier time at making reasonable claims about the nontriviality of discovered proofs and broadly preventing contamination at scale. Terence Tao discusses placeholder design patterns, suggests a proof_wanted command to replace a theorem … : 0 = 1 := sorry, and discusses downsides with the practice of sorrying out 0=1 to do placeholding (spoiler: an AI can exploit the proof of a false proposition to misbehave). This is an important point: any GSAI stack will have collaboration between proof engineers of different power levels, and lots of collaboration in proof engineering is driven by very careful bookkeeping of placeholders (saying what we need, what we have, and what would be nice to have).

A quantitative probabilistic relational hoare logic

November’s GSAI paper club.

Hoare logic is when you have precondition P : Memory -> Prop and postcondition Q : Memory -> Prop, and some imperative program c such that forall memories m and n, if Pm and executing c sends m to n, then you can prove Qn, written {P}c{Q}. Sometimes, you want to reason about relations between programs. Relational hoare logic (RHL) is when preconditions and postconditions are relations on memory states and you have two programs, so {P} c₁ ~ c₂ {Q} means that forall memories m1 m2 n1 n2, if P m1 m2 and exec c1 m1 = n1 and exec c2 m2 = n2, then Q n1 n2.

You might like to write a proof of a RHL quadruple for a cryptographic primitive, requiring you to reason about independence of two PRNGs. Independence is one of the things you don’t get in RHL, because RHL assumes deterministic programs. Probabilistic relational hoare logic (pRHL) looks again like {P} c1 ~ c2 {Q}, but this time Q is lifted to a relation between distributions over memory, reflecting the possible outcomes of programs involving probabilistic choice or random sampling. So precondition P is of type Memory -> Memory -> Prop, postcondition Q is of type ΔMemory -> ΔMemory -> Prop, and the pRHL quadruple unpacks to forall m1 m2 : Memory, forall N1 N2 : ΔMemory, if P m1 m2 and exec c1 m1 ~ N1 and exec c2 m2 ~ N2, then Q N1 N2.

We introduce eRHL, a program logic for reasoning about relational expectation properties of pairs of probabilistic programs. eRHL is quantitative, i.e., its pre- and post-conditions take values in the extended non-negative reals. Thanks to its quantitative assertions, eRHL overcomes randomness alignment restrictions from prior logics, including PRHL, a popular relational program logic used to reason about security of cryptographic constructions, and apRHL, a variant of PRHL for differential privacy. As a result, eRHL is the first relational probabilistic program logic to be supported by non-trivial soundness and completeness results for all almost surely terminating programs. We show that eRHL is sound and complete with respect to program equivalence, statistical distance, and differential privacy. We also show that every PRHL judgment is valid iff it is provable in eRHL. We showcase the practical benefits of eRHL with examples that are beyond reach of PRHL and apRHL.

pRHL is not expectational. Since preconditions and postconditions land in Prop, we can’t value preconditions and postconditions as quantities. Last july, the authors introduced a generalization of pRHL called eRHL (e for expectation), where now preexpectations and postexpectations are of type Memory -> Memory -> NNReal, and the rest follows. eRHL is special because they can prove completeness theorems for almost surely terminating (Pr(halt) = 1) programs that pRHL can’t, along with some fancy desiderata like a statistical difference ⇔ eRHL quadruple correspondence and differential privacy.

If you’re wondering why you just read all that, here’s the juice: often in GSAI position papers there’ll be some reference to expectations that capture “harm” or “safety”. Preexpectations and postexpectations with respect to particular pairs of programs could be a great way to cash this out, cuz we could look at programs as interventions and simulate RCTs (labeling one program control and one treatment) in our world modeling stack. When it comes to harm and safety, Prop and bool are definitely not rich enough.

Bengio talk coming up

November 14, 18:00-19:00 UTC

Description: Could there be safety advantages to the training of a Bayesian oracle that is trained to only do that job, i.e., estimate P(answer | question, data)? What are the scenarios in which such an AI could cause catastrophic harm? Can we even use such an oracle as the intelligence engine of an agent, e.g., by sampling actions that help to achieve goals? What can go wrong even if we assume that we have a perfect prediction of the Bayesian posterior, e.g., if the true explanatory theory is a minority voice in the Bayesian posterior regarding harm prediction? If such an oracle is estimated by a neural network with amortized inference, what could go wrong? Could the implicit optimization used to train the estimated posterior create loopholes with an optimistic bias regarding harm? Could we also use such a Bayesian oracle to obtain conservative risk estimates, i.e., bounds on the probability of harm, that can mitigate the imperfections in such an agent?

AutoVerus

Verus is a framework for annotating rust code with preconditions (“requires”) and postconditions (“ensures”) and kicks static verification down to SMT. Here’s an example program for multiplying a number by 8:

fn octuple(x1: i8) -> (x8: i8)

requires

-16 <= x1,

x1 < 16,

ensures

x8 == 8 * x1,

{

let x2 = x1 + x1;

let x4 = x2 + x2;

x4 + x4

}

Without the “require”, we may overflow the 8 bit integer. The “ensure” is a functional specification of what it means for the function to be correct. The verus paper is here and the verus docs are here.

Generative AI has shown its values for many software engineering tasks. Still in its infancy, large language model (LLM)-based proof generation lags behind LLM-based code generation. In this paper, we present AutoVerus. AutoVerus uses LLM to automatically generate correctness proof for Rust code. AutoVerus is designed to match the unique features of Verus, a verification tool that can prove the correctness of Rust code using proofs and specifications also written in Rust. AutoVerus consists of a network of LLM agents that are crafted and orchestrated to mimic human experts' three phases of proof construction: preliminary proof generation, proof refinement guided by generic tips, and proof debugging guided by verification errors. To thoroughly evaluate AutoVerus and help foster future research in this direction, we have built a benchmark suite of 150 non-trivial proof tasks, based on existing code-generation benchmarks and verification benchmarks. Our evaluation shows that AutoVerus can automatically generate correct proof for more than 90% of them, with more than half of them tackled in less than 30 seconds or 3 LLM calls.

The autoverus paper is an example in the genre of using verification tools as the ground truth in a machine learning process. A proof system is a great data labeler. GSAI is happy when this basic viewpoint makes data quality more scalable than other forms of ground truth across machine learning.

Sufficient and Necessary Barrier-like Conditions for Safety and Reach-avoid Verification of Stochastic Discrete-time Systems

A reach-avoid specification is something of the form “eventually P and always not Q”. Barrier certs are a way of proving that a system starting in the safe region stays in the safe region without directly solving the whole system of differential equations.

In this paper, we examine sufficient and necessary barrier-like conditions for the safety verification and reach-avoid verification of stochastic discrete-time systems. Safety verification aims to certify the satisfaction of the safety property, which stipulates that the probability of the system, starting from a specified initial state, remaining within a safe set is greater than or equal to a specified lower bound. A sufficient and necessary barrier-like condition is formulated for safety verification. In contrast, reach-avoid verification extends beyond safety to include reachability, seeking to certify the satisfaction of the reach-avoid property. It requires that the probability of the system, starting from a specified initial state, reaching a target set eventually while remaining within a safe set until the first hit of the target, is greater than or equal to a specified lower bound. Two sufficient and necessary barrier-like conditions are formulated under certain assumptions. These conditions are derived via relaxing Bellman equations.

Compositional Design of Safety Controllers for Large-scale Stochastic Hybrid Systems

What is the problem? When is a solution “compositional”?

The problem is stochastic hybrid systems (SHS), which are very heterogeneous: dynamics (i.e. of the kind described by differential equations) that might be discrete time in some parts and continuous time in others, and discrete state in some parts but continuous state in others. Specifically, the problem is synthesizing guaranteed controllers for these systems (some actuator that intervenes to enact a predicted output). A solution to a problem is compositional when it allows you to piece together solutions to bigger problems out of solutions to smaller problems.

In this work, we propose a compositional scheme based on small-gain reasoning for the safety controller synthesis of interconnected stochastic hybrid systems with both continuous evolutions and instantaneous jumps. In our proposed setting, we first offer an augmented scheme to represent each stochastic hybrid subsystem with continuous and discrete evolutions in a unified framework, ensuring that the state trajectories match those of the original hybrid systems. We then introduce the concept of augmented control sub-barrier certificates (A-CSBC) for each subsystem, which allows the construction of augmented control barrier certificates (A-CBC) for interconnected systems and their safety controllers under small-gain compositional conditions. We eventually leverage the constructed A-CBC and quantify a guaranteed probabilistic bound across the safety of the interconnected system. While the computational complexity of designing a barrier certificate and its safety controller grows polynomially with network dimension using sum-of-squares (SOS) optimization program, our compositional approach significantly reduces it to a linear scale with respect to the number of subsystems. We verify the efficacy of our proposed approach over an interconnected stochastic hybrid system composed of 1000 nonlinear subsystems.

DafnyBench (back in june)

The Beneficial AI Foundation and others shipped a benchmark for Dafny coding.

We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification. We test the ability of LLMs such as GPT-4 and Claude 3 to auto-generate enough hints for the Dafny formal verification engine to successfully verify over 750 programs with about 53,000 lines of code. The best model and prompting scheme achieved 68% success rate, and we quantify how this rate improves when retrying with error message feedback and how it deteriorates with the amount of required code and hints. We hope that DafnyBench will enable rapid improvements from this baseline as LLMs and verification techniques grow in quality.

There’s also this VSCode extension complimenting the paper.

Safeguarded AI TA1.4 funding call

Economists and voting theorists are being summoned to help us reason about specification generating processes under multiple stakeholders. In video form!

We already knew that ARIA is being wildly ambitious. I’d be happy if we could get any specs at all on the new spec/modeling/cert stack that ARIA is building to be checked against prod code; it seems a lot harder to make them also socially/democratically desirable. Worried a little about scope creep with these guys, but what else is new.

September 2024 Progress in Guaranteed Safe AI

Quinn Dougherty — Thu, 26 Sep 2024 20:13:25 GMT

There are no benefits with paid subscription, except you get a cut of my hard earned shapley points.

As always, do reach out to inform me how the newsletter could be better, unless you want less biased and wrong quick takes. The biased and wrong quick takes are mandatory.

If you're just joining us, background on GSAI here.

Terry Tao’s mathstodon comment

“I have played a little bit with OpenAI’s new iteration of GPT”, he begins.

Let's get everyone up to speed on one of Tao's key insights: the bottlenecks to casually superb Lean4 performance are embarrassingly non-profound, chat models getting tripped up by import paths changing across Lean versions and so on. You can only choke a model by having too much code velocity in the training data for so long, but eventually that'll fall.

That may not be a great crux for Tao's use case. I think he wants to focus on the reasoning bottlenecks, and is less concerned about coding bottlenecks given reasoning. Perhaps his interest in the Lean parts is mostly Voevedsky style: what do we want? Less homework. How do we want it? With greater assurance than more homework.

You should sit and think about if you think attaining a reasoning level is easier or harder than turning that reasoning level into Lean code once attained. I sat and thought about it for 5 minutes, and I'm still not ready to make any bets. Yes, the lesson about the embarrassingly non-profound bottlenecks suggest a reasoning level is harder than translating that reasoning level into code, but on the other hand, dependent types are a really finicky way to program.

Tao reports that we've leapt from incompetent grad student to mediocre grad student, or openai has anyway. I do not know the conversion factor between units of pure maths grad student to units of GSAI-grade software developer. I'll take 1 to 1 as my prior. How about yours? Is one pure maths grad student worth of software engineering enough to accomplish your goals, if you can make copies and replace pizza/coffee with electricity?

David Manheim’s post

I liked David Manheim’s post on self driving cars as a worked example. Like we saw last month, worked examples lead to identifying cruxes clearer and faster. In this case, the commenters mostly objected about unstated assumptions, and Manheim would go on to edit more assumptions (even in the title of the post!).

Tdietterich:

Perhaps we should rename the "provable safety" area as "provable safety modulo assumptions" area and be very explicit about our assumptions. We can then measure progress by the extent to which we can shrink those assumptions.

Jacobjacob:

Very much agree. I gave some feedback along those lines as the term was coined; and am sad it didn't catch on. But of course "provable safety modulo assumptions" isn't very short and catchy...
I do like the word "guarantee" as a substitute. We can talk of formal guarantees, but also of a store guaranteeing that an item you buy will meet a certain standard. So it's connotations are nicely in the direction of proof but without, as it were, "proving too much" :)

I agree with the sentiment in the comments that provability in the sense of system safety is provability up to which assumptions, and hope to see people be more clear about this.

Models that prove their own correctness

This was june but I just found it. We’re reading it for next month’s GSAI paper club, which is convening 2024-10-17 at 10a pacific. Here, the word “proof” is in the context of “interactive proof”, or some multi-step game consisting of a prover and a verifier popular in computational complexity theory and cryptography.

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured *on average* over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm V via an Interactive Proof. Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output *and* successfully proves its correctness to V. The *soundness* property of V guarantees that, for *every* input, no model can convince V of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by V. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.

Some people in GSAI only want to gain assurance about code that AIs are writing, others want to gain assurance about the learned components themselves. If you’re in the latter camp, this proposal looks promising, but there’s a long road ahead to scale it to non-toy problems.

Types Are Not Safe

I came across this lambda conf talk solely because I used to work for the speaker– I did not check whether other lambdaconf talks are worthy of endorsement.

I'm including it in the newsletter to make sure we get on the same page about what type systems can't do. Type systems check for a pretty narrow kind of correctness. No one has lifted color blind approval of UIs to types. No one has made a PDE solver where the typechecker knows if the initial conditions you loaded in send the plane into a hillside. Lattice based access control is kinda like lifting security to a type checker, but it doesn’t know which permission sets are more or less susceptible to insider, phishing, or bribery risks. This matters, because if you cash out “formal verification” at Lean, then you've just staked it all on types, and types are not safe. I think we need a broad portfolio and several different meanings of the word “certificate”, but I keep running into people who let their guard down and hail the coming age of formally verified code generation, and if they were careful they wouldn't be so optimistic.

I tend to say that type systems are for increasing the jurisdiction of compiletime knowledge. The runtime is where your adversary has home team advantage. But many correctness notions that are in principle liftable to a sufficiently expressive typechecker are not in practice worth the trouble. Now, this is due to ergonomic bottlenecks that an LLM could steamroll over, but then you're just shifting your error/attack surface from the spec-implementation gap to “where the h!ck did this spec come from?” and of course the ever present world-spec gap.

Another title for the talk could've been “Types Lie”, but that section of the talk relies a little on advanced language extensions specific to haskell that it isn't important you understand. Some of them, like the untypeability of the type of types, is way less bad in dependently typed languages. Getting confused because you wrote some type algebra that accidentally involves uninhabited types happened to me in Lean a couple weeks ago, though.

Opportunities

Topos jobs

Both of the Oxford postdocs are concerned with world modeling and the foundations of what eventually will be specification engineering for ARIA-style Safeguarded AI.

UK’s AISI cites GSAI as an area they want to branch more into, are hiring

The UK’s AI Safety Institute is building a team for articulating positive “safety cases”, stories in which deploying models of the future is not a bad idea. They want technical talent to write these stories not just so that they’ll be calibrated but also so the team has the capacity in house to deliver on some of the empirical needs of those stories. They do plan to open an SF office over the next year, but right now you need to be onsite in London.

August 2024 Progress in Guaranteed Safe AI

Quinn Dougherty — Tue, 27 Aug 2024 02:42:59 GMT

A wager you say

One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock?

Zac Hatfield-Dodds (of hypothesis/pytest and Anthropic, was offered and declined authorship on the GSAI position paper) challenged Ben Goldhaber to a bet after Ben coauthored a post with Steve Omohundro. It seems to resolve in 2026 or 2027, the comment thread should get cleared up once Ben gets back from Burning Man. The arbiter is Raemon from LessWrong.

Zac says you can’t get a provably unpickable lock on this timeline. Zac gave (up to) 10:1 odds, so recall that the bet can be a positive expected value for Ben even if he thinks the event is most likely not going to happen.

For funsies, let’s map out one path of what has to happen for Zac to pay Ben $10k. This is not the canonical path, but it is a path:

Physics to the relevant granularity (question: can human lockpicks leverage sub-newtownian issues?) is conceptually placed into type theory or some calculus. I tried a riemann integral in coq once (way once), so it occurs to me that you need to decide if you want just the functional models (perhaps without computation / with proof irrelevance) in your proof stack or if you want the actual numerical analysis support in there as well.
Good tooling, library support, etc. around that conceptual work (call it mechlib) to provide mechanical engineering primitives
A lock designing toolkit, depending on mechlib, is developed
Someone (e.g. a large language model) is really good at programming in the lock designing toolkit. They come up with a spec L.
You state the problem “forall t : trajectories through our physics simulation, if L(t) == open(L) then t == key(L)”
Then you get to write a nasty gazillion line Lean proof
Manufacture a lock (did I mention that the design toolkit has links to actual manufacturing stacks?)
Bring a bunch to DefCon 2027 and send another to the lockpicking lawyer
Everyone fails. Except Ben and the army of postdocs that $9,999 can buy.

Looks like after the magnificent research engineering in steps 1 and 2, the rest is just showing off and justifying those two steps. Of course, in a world where we have steps 1 and 2 we have a great deal of transformative applications of formal modeling and verification just in reach, and we’ll need a PoC like locks to practice and concretize the workflow.

Cryptography applications tend to have a curse of requiring a lot of work after the security context, permission set, and other requirements are frozen in stone, which means that when the requirements change you have to start over and throw out a bunch of work (epistemic status: why do you think so many defi projects have more whitepapers than users?). The provably unpickable lock has 2 to 10 x that problem– get the granularity wrong in step one, most of your mechlib implementation won’t be salvageable. As the language model iterates on the spec L in step 5, the other language model has to iterate on the proof in step 6, because the new spec will break most of the proof.

Sorry I don’t know any mechanical engineering, Ben, otherwise I’d take some cracks at it. The idea of a logic such that its denotation is a bunch of mechanical engineering primitives seems interesting enough that my “if it was easy to do in less than a year someone would’ve, therefore there must be a moat” heuristic is tingling. Perhaps oddly, the quantum semantics folks (or with HoTT!) seem to have been productive, but I don’t know how much of that is translatable to mechanical engineering.

Reinforcement learning from proof assistant feedback, and yet more monte carlo tree search

DeepSeek’s paper

The steps are pretraining, supervised finetuning, RLPAF (reinforcement learning from proof assistant feedback), and MCTS (monte carlo tree search). RLPAF is not very rich: it’s a zero reward for any bug at all and a one for a happy typechecker. Glad they got that far with just that.

You can use the model at deepseek.com.

Harmonic ships their migration of miniF2F to Lean 4, gets 90% on it, is hiring

From their “one month in” newsletter. “Aristotle”, which has a mysterious methodology since I’ve only seen their marketing copy rather than an arxiv paper, gets 90% on miniF2F 4 when prompted with natural language proofs. It doesn’t look to me like the deepseek or LEGO papers do that? I could be wrong. It’s impressive just to autoformalize natural language proofs, I guess I’m still wrapping my head around how much harder it is (for an LLM) to implement coming up with the proof as well.

Jobs: research engineer and software engineer

Atlas ships their big google doc alluded to in the last newsletter

Worth a read! The GSAI stack is large and varied, and this maps out the different sub-sub-disciplines. From the executive summary:

You could start whole organizations for every row in this table, and I wouldn’t be a part of any org that targets more than a few at once for fear of being unfocused. See the doc for more navigation (see what I did there? Navigating like with an atlas, perhaps? Get it?) of the field’s opportunities.1

Efficient shield synthesis via state-space transformation

Shielding is an area of reactive systems and reinforcement learning that marks states as unsafe and synthesizes a kind of guarding layer between the agent and the environment that prevents unsafe actions from being executed in the environment. So in the rejection sampling flavored version, it literally intercepts the unsafe action and tells the agent “we’re not running that, try another action”. One of the limitations in this literature is computational cost, shields are, like environments, state machines plus some frills, and there may simply be too many states. This is the limitation that this paper focuses on.

We consider the problem of synthesizing safety strategies for control systems, also known as shields. Since the state space is infinite, shields are typically computed over a finite-state abstraction, with the most common abstraction being a rectangular grid. However, for many systems, such a grid does not align well with the safety property or the system dynamics. That is why a coarse grid is rarely sufficient, but a fine grid is typically computationally infeasible to obtain. In this paper, we show that appropriate state-space transformations can still allow to use a coarse grid at almost no computational overhead. We demonstrate in three case studies that our transformation-based synthesis outperforms a standard synthesis by several orders of magnitude. In the first two case studies, we use domain knowledge to select a suitable transformation. In the third case study, we instead report on results in engineering a transformation without domain knowledge.

Besides cost, demanding a lot of domain knowledge is another limitation of shields, so this is an especially welcome development.

Funding opportunities

ARIA jumped right to technical area three (TA3), prototyping the gatekeeper. Deadline October 2nd. Seems geared toward cyber-physical systems folks. In the document:

Note that verified software systems is an area which is highly suitable for a simplified gatekeeper workflow, in which the world-model is implicit in the specification logic. However, in the context of ARIA’s mission to “change the perception of what’s possible or valuable,” we consider that this application pathway is already perceived to be possible and valuable by the AI community. As such, this programme focuses on building capabilities to construct guaranteed-safe AI systems in cyber-physical domains. That being said, if you are an organisation which specialises in verified software, we would love to hear from you outside of this solicitation about the cyber-physical challenges that are just at the edge of the possible for your current techniques.

This is really cool stuff, I hope they find brave and adventurous teams. I had thought gatekeeper prototypes would be in minecraft or mujoco (and asked a funder if they’d support me in doing that), so it’s wild to see them going for actual cyberphysical systems so quickly.

Paper club

Add to your calendar. On September 19th we will read a paper about assume-guarantee contracts with learned components. I’m liable to have made a summary slide deck to kick us off, but if I don’t, we’ll quietly read together for the first 20-30 minutes then discuss. The google meet room in the gcal event by default.

Andrew Dickson’s excellent post

See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to just pivot all my worldview and resulting research is captured in one of Steve’s comments:

I'm focused on making sure our infrastructure is safe against AI attacks.

Like, a very strong version I almost endorse is “GSAI isn’t about AI at all, it’s about systems coded by extremely powerful developers (which happen to be AIs)”, and ensuring safety, security, and reliability capabilities scale at similar speeds with other kinds of capabilities.

It looks like one can satisfy Dickson just by assuring him that GSAI is a part of a swiss cheese stack, and that no one is messianically promoting One Weird Trick To Solve Alignment. Of course, I do hope that no one is messianically promoting One Weird Trick…

One problem off the top of my head regarding the InterFramework section: Coq and Lean seems the most conceptually straightforward since they have the same underlying calculus, but even there just a little impredicativity or coinduction could lead to extreme headaches. Now you can have a model at some point in the future that steamrolls over these headaches, but then you have a social problem of the broader Lean community not wanting to upstream those changes– various forks diverging fundamentally seems problematic to me, would lead to a lot of duplicated work and missed opportunities for collaboration. I plan to prompt Opus 3.5 with “replicate flocq in lean4” as soon as I get access to the model, but how much more prompting effort will it be to ensure compliance with preexisting abstractions and design patterns, so that it can not only serve my purposes but be accepted by the community? At least there’s no coinduction in flocq, though some of the proofs may rely on set impredicativity for all I know (I haven’t looked at it in a while).

July 2024 Welcome Edition

Quinn Dougherty — Thu, 25 Jul 2024 20:35:02 GMT

Welcome to the newsletter!

Here we’ll be doing a monthly news roundup inclusive of my biased and wrong takes, plus maybe a remark about a non-current paper I happened to have read this month.

In this welcome edition, I won’t attempt to summarize all the news I missed before the newsletter existed, just the most recent notes.

There are no benefits to subscribing, the free tier has it all. But I enabled paid subscriptions just in case people want to signal support, to check how it affects my motivation to write these, and to see what happens.

Let me know changes I can make to maximize the impact of the newsletter by commenting here or emailing quinndougherty92 gmail.

Thanks to Ben Goldhaber for the baseball metaphor idea and reviewing this post, and thanks to Steve Omohundro for suggesting the newsletter’s name, but all demerits go to me (the points go to me too btw).

Where are we? What is guaranteed safe AI?

Guaranteed-safe AI is a collection of research agendas to make the technological transition go well. There’s a video explainer here, but the main resource is the preprint position paper Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems.

Personally, I had a few years of telling people that formal verification is not the droid we’re looking for before the latest push for GSAI, so you might be wondering if my worldview changed. In my earlier writing, I was mostly responding to a very strong “just capture alignment properties in ‘specs’ and prove the implementation ‘correct’”, however I’m attracted to GSAI proposals because they have a swiss cheese matter-of-degree mindset that I don’t usually see when people get optimistic about the stronger version. Separately, proof assistants have played a big role in my career, so I’m biased toward theories of change that might leverage my experience.

Atlas Computing talk

The monthly seminar featured Evan Miyazono of Atlas Computing, which is available on youtube. The GSAI position papers are fine, but they need to be injected with a roadmap, blueprint, or next steps. Ideally, such a roadmap would go all the way down to estimates of headcount needed for different tactics. Wouldn’t it be great if someone was doing this? Anyways, major thanks to Atlas for doing this! Expect soon a document from Atlas that’s been incubating as a google doc all month.

Podcast episode

Nora Ammann and Ben Goldhaber appeared on the Cognitive Revolution podcast to discuss GSAI. The junction of world models, specifications, and assured implementations is so lofty and hairy that good science communication isn’t easy, so we need to see more of this.

QAISI launch with job description

The Quantitative Safety AI Initiative has a website up with a job posting for an AIS Research Engineer

The job says it will mostly be focusing on the Tegmark & Omohundro preprint, even though two additional authors of the position paper of mammoth author list are research leads for the orgs. Dawn Song, the one QAISI research lead not on that position paper, does a lot of security and cryptography work, but is no stranger to AI safety (she co-advised Dan Hendrycks, off the top of my head). Excited to see what the new org gets up to!

Guarantees-based mechanistic interpretability team preprint

This paper isn’t 100% relevant, since GSAI tends more toward formally verifying artifacts of models than formally verifying models. However, it represents a natural thought you’d have when you try smashing formal verification into machine learning so I’m including it. This is not a commitment to make the newsletter a big tent for any FV/ML crossover episode overall, but it can be sometimes. Gross et. al. lay the groundwork for formal proofs of model performance (lower bounds of accuracy). They accomplish this with mechinterp tricks like ablations, but since we’re dealing in proof we need pessimal ablations or ablating with worse cases rather than mean or random ablations.

I say “groundwork” because while the proofs in the current paper are “formal” by many standards, they are not yet formal in the sense of machine checked. But one step at a time, the current work seems like a big leap forward to me, and I’m skeptical of the marginal value add of putting those proofs in Coq.

See this splainer / companion piece by the authors plus this supplemental by Jacob Hilton of ARC (which funded the work).

Progress in automating mathematics

This is a very active area with either tangential or moonshot safety stories, so I’ll never be exhaustive in covering the latest happenings, but how about a tasting.

AlphaProof

Deepmind: AI achieves silver-medal standard solving International Mathematical Olympiad problems. Coupling a language model specializing in Lean with the AlphaZero RL algorithm, AlphaProof is competitive with IMO silver medalists, and very close to gold. The blog doesn’t link to a preprint, omits details about how the IMO data was Leanified, and omits the proportion of synthetic and self-play data vs other data. So it remains to be seen how significant this will be– I don’t have a strong inside view about benchmark hacking, but I know enough about the problem area to be skeptical of everything.

Remember that proofs are programs. We should expect some framing of the automated mathematics problem to do the conceptual heavy lifting for the automated software engineering problem. Reasons you’d hesitate to buy up all the yes right now is because IMO in lean is not that framing. Lean (especially Mathlib) is, contra your assumptions when you walk into a dependent type theory, optimized for classical mathematics. Without an analysis of the constructive capacities of AlphaProof’s capabilities, you shouldn’t assume that proofs are programs, machine learning may not walk down the halls of Curry-Howard just yet.

GSAI paper club?

I’m considering a monthly meetup to eat through papers. Comment here or email quinndougherty92 gmail to express interest.

Shortlist of papers we could cover (not all of them timely)

Feel free to suggest others. Much of me would like to do a centralized summarizer (often me) format, but otherwise we could do a longer meetup where we read together on the call before discussing.

Funding opportunities (jk)

ARIA is currently closed, and SFF (which has some GSAI folks as panelists this year) deadline was a few weeks ago.

Neel Nanda, leading a manifund budget, informed me over discord that he’s pessimistic about GSAI and is unlikely to make grants in the area. I don’t know what other Manifund regrantors think, but there’s still a million or so in play for AI on manifund so could be worth a shot.

Coming up

You can sign up for August’s GSAI seminar here, from the authors of this post. We’ll have to see what progress has been made in any prototyping that’s been done– the post I found to be pretty pie-in-the-sky.

See you next month

Like and subscribe, etc. Also, I am not of the twitterati so I don’t know how to hang out in specific places to make this newsletter better, but it’d be cool to cite other peoples’ quick takes from time to time. To do this, I need your twitter account recommendations, so shoot those over.