GAIA challenges AI with tasks easy for humans but tough for AI, showing a 92% success rate for humans versus 15% for GPT-4. The benchmark includes 466 questions designed to test fundamental abilities like reasoning and web browsing, with a leaderboard hosted online to track AI performance. https://arxiv.org/abs/2311.12983
@estherschindler AGI my patootie