AI Safety & Society
When agents fail
When agents fail: a unit in AI Safety & Society. 4 chapters, 14 lessons total.
Chapters
Reward hacking in production
Memory poisoning and indirect prompt injection
Evaluation awareness
Multi-agent drift