I’m Johannes Taraz, welcome to my website! I am working on different things related to AI safety. —
At the moment I’m working on:
- Understanding Reasoning with Thought Anchors and Probes: ARENA capstone project
- Linear Probes (Mech. Interp.): New datasets to detect deception with linear probes in Llama-3.1-8B (see here for the writeup)
- Activation Steering (Mech. Interp.): A fun 20hr research project: Are truthful models more dangerous?
- DeepONets: We’ve submitted a paper based on my master’s thesis!
- Debate between LLMs: can you create a large enough incentive for an LLM to win a debate even if it conflicts with its other goals? try it out! (with your OpenAI API key)
📫 Contact/Other pages
Short CV
| 2026- | AI safety PhD student with Sahar Abdelnabi |
| 2026 | Participant at OOD Propensity generalization @ SPAR |
| 2026 | Facilitator of BlueDot Impact technical AI safety courses |
| 2026 | ARENA - Alignment Research Engineer Accelerator |
| 2025 | ML4Good - AI Safety Bootcamp |
| 2023-2025 | M.Sc. Applied Mathematics @ Delft University of Technology (cum laude - highest distinction in NL), click for thesis and click for paper |
| 2023-2025 | M.Sc. Scientific Computing @ Technische Universität Berlin |
| 2024 | Student Research Assistant: Computational Chemistry @ German Aerospace Center (DLR) & Helmholtz Institute Ulm |
| 2023-2024 | Student Research Assistant: Numerical Mathematics @ Weierstrass Institute for Applied Analysis and Stochastics |
| 2021-2023 | Teaching Assistant: Physics Lab @ Technische Universität Berlin |
| 2019-2023 | B.Sc. Physics @ Technische Universität Berlin click for thesis |