21 open roles (AI safety, policy & security)


Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems. Their research interests span multiple areas including natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability. Recently announced teams working on safety issues include their Frontier Red Team (working on adversarial testing of advanced ML models), and their Alignment Stress-Testing Team (working on red-teaming Anthropic's alignment and evaluation efforts).

In our career review on working at a leading AI lab you can find some discussion on various considerations for and against this option.

