Verification of a Global AI Treaty
Mentor: Naci Cankaya
Project area: Technical Verification Mechanisms for Low-Trust AI Governance
Project Language
Minimum Time Commitment
15 hours per week.
Project Abstract
Main Goal:
Build the datacenter lie detector. Or at least figure out good approaches to technical challenges around network taps, workload re-execution or physical security/data integrity/confidentiality. Or creative ways to catch black site AI clusters.
Methodologies and contributions:
To verifiy that a ML workload was computed as claimed, one can re-compute inputs and outputs on a secure, verifier-controlled device.
Open problems around workload re-execution are probably quite accessible for beginners to work on, since the hardware needed for experimentations can be rented (rather cheaply) via the cloud. Key problems to solve here revolve around non-determinism, red-teaming and threat modeling.
More context: https://nacicankaya.substack.com/p/catching-misreporting-about-ml-hardware
One can collect evidence of what an ML cluster is doing, by capturing network traffic with dedicated devices that hash what they observe, without revealing confidential data directly.
Network tap work is heavy on physical engineering, but maybe you have unique skills and access.
Even without those, you can contribute to open questions around threat modeling: What covert workloads are possible at what covert I/O bandwidths in what parts of the datacenter network?
More context: https://nacicankaya.substack.com/p/catching-misreporting-about-ml-hardware-bd2
Preventing secret AI clusters is above even my paygrade, but there are cool ideas to explore for how to approach this, given political will and/or manufacturer-cooperation. The supply chain of some key components is both international and constrained by multiple critical chokepoints, which could be a promising opportunity to fully account for what hardware exists or why has it. Also, even if there was an established "ground truth" for which hardware exists, how could inspection catch decoys, diversion and gaps?
I am also interested in political angles to this. The debate around "AI arms race" has become quite toxic and zero-sum minded, and this needs to change. I appreciate good ideas for what to do about this. Generally, I think that "stop doing X" activism is less impactful than "we figured out a better way, and this is how".
Theory of Change
Bad frameworks produce bad decisions. The question of machine moral status will increasingly affect AI development and governance. Currently, most people reasoning about it lack adequate conceptual tools. This matters for catastrophic risk in several ways.
Under-reaction: if AI systems develop welfare-relevant internal states and we lack frameworks to recognize this, we may create systems with misaligned interests while dismissing their signals as "mere computation." A system that experiences something like suffering under certain conditions, and whose operators dismiss this, is a system with reason to deceive.
Over-reaction: anthropomorphizing systems that lack morally relevant properties wastes attention and resources, and may constrain beneficial AI development without corresponding benefit.
Poor discourse: without shared conceptual foundations, public debate about AI consciousness polarizes between dismissive and credulous positions. Neither serves good governance.
The primer addresses these by training researchers and practitioners to reason carefully across multiple frameworks, recognize what each assumes, and navigate uncertainty without false confidence. The German focus (incorporating European philosophical traditions, piloting with German-speaking users) builds SAIGE's national infrastructure while contributing to the broader field.
Conceptual clarity is infrastructure. This project builds it.
Desired Mentee Background
Computer Science/ML, Maths, International Relations, Political Science, Anything quantitative that involves programming and ideally ML.
Desired Mentee Level of Education
Any level. Must have taken a course that covers ML basics or take an ML course during the semester they work with me on the project.
Other Mentee Requirements
You learn fast and iterate and experiment quickly. You find technical problems around AI interesting on their own, rather than just the outcome of your work.
You have high media literacy and can separate well-grounded signal from noise/sensationalism/confabulation. This applies to academic sources and news, as well as AI outputs.
You have a basic understanding of ML algorithms and hardware. A uni course or similar/better.
You have an ambitious can-do attitude and a first-principles based mindset. Take inspiration from Michael Faraday:
"Nothing is too wonderful to be true, if it be consistent with the laws of nature."