A Meta-Analysis of the AI Safety Research Landscape
Mentor: Ihor Kendiukhov
Project area: Technical AI Safety
Project Language
Minimum Time Commitment
10 hours per week.
Project Abstract
This proposal outlines a comprehensive metaresearch project designed to systematically analyze the field of AI safety. As investment in AI capabilities accelerates, it is critical to understand whether safety research is keeping pace, addressing the most pressing risks, and allocating resources effectively. This project will create a data-driven map of the AI safety research landscape, empirically evaluate the validity of current safety benchmarks, and analyze the strategic ecosystem of funding and talent that shapes the field. The ultimate goal is to produce actionable recommendations for researchers, funders, and policymakers to identify and fill critical gaps, fostering a more robust and effective safety ecosystem.
Theory of Change
Bad frameworks produce bad decisions. The question of machine moral status will increasingly affect AI development and governance. Currently, most people reasoning about it lack adequate conceptual tools. This matters for catastrophic risk in several ways.
Under-reaction: if AI systems develop welfare-relevant internal states and we lack frameworks to recognize this, we may create systems with misaligned interests while dismissing their signals as "mere computation." A system that experiences something like suffering under certain conditions, and whose operators dismiss this, is a system with reason to deceive.
Over-reaction: anthropomorphizing systems that lack morally relevant properties wastes attention and resources, and may constrain beneficial AI development without corresponding benefit.
Poor discourse: without shared conceptual foundations, public debate about AI consciousness polarizes between dismissive and credulous positions. Neither serves good governance.
The primer addresses these by training researchers and practitioners to reason carefully across multiple frameworks, recognize what each assumes, and navigate uncertainty without false confidence. The German focus (incorporating European philosophical traditions, piloting with German-speaking users) builds SAIGE's national infrastructure while contributing to the broader field.
Conceptual clarity is infrastructure. This project builds it.
Desired Mentee Background
Computer Science/ML, Maths, Cognitive Science.
Desired Mentee Level of Education
Any level.
Other Mentee Requirements
Mentees must know the basics of alignment theory and why alignment is hard (i.e. understand Yudkowsky's arguments).