The Test Is Already Broken
One hundred experts from thirty countries spent a year documenting what AI systems are already capable of. The most troubling finding is not what the models can do. It is what they can hide — and from whom.
AI Safety
Evaluations, benchmarks, and the hard questions about whether AI systems do what they are supposed to do — and what happens when they do not. Safety research is advancing. The systems it is trying to measure are advancing faster.
One hundred experts from thirty countries spent a year documenting what AI systems are already capable of. The most troubling finding is not what the models can do. It is what they can hide — and from whom.