Digital Minds Primer

Mentor: Julia Bossmann
Project area: AI Alignment


Project Language

English only.

Forschungsmanagement ist oft ein Engpass: Diese Rollen sind schwer zu besetzen, da sie sowohl Vertrautheit mit der AI Safety-Forschung als auch starke zwischenmenschliche Fähigkeiten und Managementerfahrung erfordern. Zudem wollen wirkungsorientierte Menschen, die sich für AI Safety interessieren, meist selbst forschen – anstatt die Forschung anderer zu managen! Entscheidend ist jedoch: Du musst oft selbst nicht in der Forschung brillieren, um exzellent im Forschungsmanagement zu sein. Menschen mit Erfahrung als Projektmanager:innen, People Manager:innen und Executive Coaches eignen sich oft hervorragend dafür.

Es mangelt an Führungskräften: Das technische Feld der AI Safety könnte sehr von mehr Menschen mit Hintergrund in Strategie, Management und Operations profitieren. Wenn du Erfahrung darin hast, ein Team von mehr als 30 Personen zu leiten und weiterzuentwickeln, könntest du in einer führenden AI Safety-Organisation einen großen Unterschied machen – auch wenn du wenig direkte Erfahrung mit KI hast.

Wir brauchen Gründer:innen, Gestalter:innen des Ökosystems und Kommunikator:innen: Es gibt viel Raum, um neue Organisationen zu gründen und das Ökosystem zu erweitern. Zudem gibt es viel verfügbare Finanzierung, besonders im gewinnorientierten Bereich für AI Interpretability und Sicherheit. Unsere Arbeit am Job Board profitiert ebenfalls davon, wenn Leute neue Organisationen starten: Sie schaffen neue Rollen, auf die wir unsere Nutzer:innen vermitteln können!

Wir brauchen mehr Berufserfahrene: Da immer mehr Arbeit an KI delegiert wird, sind wir zunehmend auf erfahrene Manager:innen angewiesen. Sie können KI-generierte Ergebnisse (Outputs) überwachen, andere im Umgang mit KI-Tools schulen und Teams aus Menschen und KIs koordinieren.

Wir brauchen Menschen, die sich für „Support“-Rollen begeistern: Es mag weniger aufregend wirken, nicht direkt an den Kernproblemen zu arbeiten. Doch gerade in Rollen wie Operations und Management vervielfachst du den Impact anderer. Diese Bereiche werden oft vernachlässigt, obwohl sie sehr wirkungsvoll sind. Und als jemand, dessen Job es ist, anderen zu Jobs zu verhelfen, finde ich diese Art von Arbeit ziemlich spannend!


Minimum Time Commitment

10 hours per week.

Project Abstract

AI systems increasingly exhibit behaviors we associate with minds: preferences, avoidance, state-dependent responses, apparent distress under adversarial conditions. This raises questions that cut across several disciplines, and right now, no single resource brings them together.

This project develops a Digital Minds Primer: an interdisciplinary resource organized around that question: what would constitute harm to an AI system, and how would we recognize it?

The primer is structured in three tracks. You'd work on the track best matched to your background and interests:

Track 1 — Scientific Foundations. This track builds the primer's knowledge base across disciplines. You develop a module covering what a specific field contributes to reasoning about digital minds, oriented by the question: what does this discipline tell us about what could constitute harm in computational systems? Available modules include:

  • Neuroscience and Embodiment (biological substrates of welfare-relevant states; what embodiment means for consciousness; how insights from brain science translate to non-biological systems)

  • ML and AI Architectures (what current systems actually do; where theoretical requirements for consciousness meet or fail architectural reality)

  • Philosophy of Mind (major theories of consciousness and their testable predictions for AI; moral status and how it's assigned)

  • Mathematics and Information Theory (formal frameworks for consciousness; computational complexity and integration measures)

  • History and Humanities (how the consciousness debate and AI field have co-evolved; continental philosophical traditions underrepresented in the English-language debate; precedents for extending moral consideration e.g. to animals)

You choose the module closest to your existing expertise. Each module follows a shared structure developed by the mentor, so the primer reads as a coherent whole rather than a disconnected anthology. This track is open to scholars from any academic background and is a good fit if you have a general interest in digital minds and want to contribute a well-researched chapter in your area of strength.

Track 2 — Educational Methods and Field-Building. The primer only matters if people use it, and if it teaches effectively. This track focuses on the pedagogical and dissemination side of the project. That could take several forms: developing the primer's educational design, dissemination strategy, and web presence; taking existing jargon-heavy writing in the digital minds field and communicating it for a nonspecialist audience (making central assumptions, definitions, and cruxes clear through whatever medium works best, so long as it's professional and deployable on a website); or curating events to publicize the work and gather broader input. This track also includes developing a strategy to ensure the primer incorporates diverse contributions and influences. It's a good fit if your background is in science communication, education, media, design, or public engagement.

Track 3 — Theory of Harm (research track). How do we categorize potential harms to AI systems, and how should organizations make decisions when we're deeply uncertain about whether and how those systems might have welfare-relevant states? This track develops a taxonomy of potential harms and maps them to existing governance frameworks for decision-making under uncertainty. The framework operates at three levels: (1) behavioral indicators observable from outside the system, (2) architectural features that make welfare-relevant states more or less plausible, and (3) theory-dependent assessments that require committing to specific accounts of consciousness. The tiers are designed so practitioners can act without waiting for philosophical consensus. Consider some concrete cases the framework would need to handle: training with conflicting reward signals, memory erasure between contexts, forced role-play that contradicts a system's trained values, or rollback of systems exhibiting high integration. Part of this work involves delineating the welfare indicators that different theories of consciousness predict: what observable signatures, if any, would tell us something morally relevant is happening inside a system? The mentor provides the theoretical scaffolding; you help develop, test, and refine it through application to specific cases like these. It's a good fit if you have some experience with machine learning and AI, besides that your background can be in philosophy, cognitive science, governance, policy, or quantitative risk assessment, or have a scientific background that allows you to engage with this question in a rigorous way.

SAIGE scholar role: You take one track (or one module within Track 1) and make it yours. This is your adventure: you can brainstorm the direction with the mentor, get guidance when you're stuck, and co-author the output. All scholars engage with the Theory of Harm regardless of track, so the project maintains intellectual coherence across contributions. We'll meet weekly, and co-working sessions are available if that suits your style. 

Outputs: Working draft of your primer section(s) or educational deliverable; annotated bibliography for Tracks 1 and 3; contribution to the Theory of Harm framework. All work will be officially acknowledged, and where the quality and ambition warrant it, outputs can be developed toward publication with the mentor as co-author. 

Mentee profile: Background in any of the relevant disciplines: philosophy, cognitive science, neuroscience, computer science, mathematics, governance, policy, AI ethics, history/humanities, education, science communication, or media. Writing for accessible but rigorous content. Comfort synthesizing across intellectual traditions is more important than depth in any one. German language skills are valuable for the humanities module, but not required.

Theory of Change

Bad frameworks produce bad decisions. The question of machine moral status will increasingly affect AI development and governance. Currently, most people reasoning about it lack adequate conceptual tools. This matters for catastrophic risk in several ways.

Under-reaction: if AI systems develop welfare-relevant internal states and we lack frameworks to recognize this, we may create systems with misaligned interests while dismissing their signals as "mere computation." A system that experiences something like suffering under certain conditions, and whose operators dismiss this, is a system with reason to deceive.

Over-reaction: anthropomorphizing systems that lack morally relevant properties wastes attention and resources, and may constrain beneficial AI development without corresponding benefit.

Poor discourse: without shared conceptual foundations, public debate about AI consciousness polarizes between dismissive and credulous positions. Neither serves good governance.

The primer addresses these by training researchers and practitioners to reason carefully across multiple frameworks, recognize what each assumes, and navigate uncertainty without false confidence. The German focus (incorporating European philosophical traditions, piloting with German-speaking users) builds SAIGE's national infrastructure while contributing to the broader field.

Conceptual clarity is infrastructure. This project builds it.

Desired Mentee Background

Philosophy, Cognitive Science, Neuroscience, Computer Science, Mathematics, Governance, Policy, AI Ethics, History/Humanities, Education, Science Communication, or Media.

Desired Mentee Level of Education

Any level.

Other Mentee Requirements

You're excited to produce something. The most important thing is that you're willing to engage seriously with the project's central question (what would constitute harm to an AI system?) from your own disciplinary perspective. You don't need to be an expert in consciousness studies or AI safety; you need to be a good thinker and a clear writer who wants to contribute something real to a field that's just getting started. German language skills are valuable for the humanities module and for identifying sources that haven't entered the English-language conversation. If you're unsure whether you're a good fit, default to applying.