Most AI benchmarks don’t tell us much. They ask questions that can be solved with rote memorization, or cover topics that ...