Hello there!

I’m Johannes Taraz, welcome to my website! I am working on different things related to AI safety. —

At the moment I’m working on:

Understanding Reasoning with Thought Anchors and Probes: ARENA capstone project
Linear Probes (Mech. Interp.): New datasets to detect deception with linear probes in Llama-3.1-8B (see here for the writeup)
Activation Steering (Mech. Interp.): A fun 20hr research project: Are truthful models more dangerous?
DeepONets: We’ve submitted a paper based on my master’s thesis!
Debate between LLMs: can you create a large enough incentive for an LLM to win a debate even if it conflicts with its other goals? try it out! (with your OpenAI API key)

2026-	AI safety PhD student with Sahar Abdelnabi
2026	Participant at OOD Propensity generalization @ SPAR
2026	Facilitator of BlueDot Impact technical AI safety courses
2026	ARENA - Alignment Research Engineer Accelerator
2025	ML4Good - AI Safety Bootcamp
2023-2025	M.Sc. Applied Mathematics @ Delft University of Technology (cum laude - highest distinction in NL), click for thesis and click for paper
2023-2025	M.Sc. Scientific Computing @ Technische Universität Berlin
2024	Student Research Assistant: Computational Chemistry @ German Aerospace Center (DLR) & Helmholtz Institute Ulm
2023-2024	Student Research Assistant: Numerical Mathematics @ Weierstrass Institute for Applied Analysis and Stochastics
2021-2023	Teaching Assistant: Physics Lab @ Technische Universität Berlin
2019-2023	B.Sc. Physics @ Technische Universität Berlin click for thesis