Chris Olah leads interpretability at Anthropic and is known for pioneering the field of mechanistic interpretability of AI systems.

References

edit