News

OpenAI’s GPT-5 Matches Human Experts in Key Professional Tasks

Cansin Cengiz

25 Sep 2025 — 2 min read

OpenAI Unveils GDPval Benchmark: How Close Is AI to Human-Level Professional Work?

OpenAI has introduced a new benchmark, GDPval, designed to measure how its latest AI models compare to human professionals across multiple industries. This development marks a significant step in evaluating artificial intelligence’s progress toward handling complex, economically valuable tasks—an essential component of OpenAI’s pursuit of artificial general intelligence (AGI).

What Is GDPval?

GDPval focuses on nine industries that make up the bulk of the U.S. economy, including healthcare, finance, manufacturing, and government. The benchmark tests AI models and humans across 44 distinct professions, from software engineering to journalism and nursing. Experienced professionals were asked to evaluate AI-generated reports against those created by their peers, judging which was superior for specific job-relevant tasks.

Key Findings from the First GDPval Results

GPT-5-high—an enhanced version of GPT-5—was found to be as good as, or better than, industry experts in 40.6% of evaluated tasks.
Anthropic’s Claude Opus 4.1 model scored even higher, matching or beating human experts in 49% of cases, though OpenAI suggests this may be influenced by the model’s focus on visually appealing output.
For comparison, OpenAI’s GPT-4o—released about 15 months ago—achieved only a 13.7% win or tie rate, demonstrating rapid AI advancement in professional domains.

Limitations and Future Plans

It’s important to note that GDPval currently evaluates only a narrow slice of what professionals do—mainly the creation of research-style reports. Real-world jobs involve a broader range of skills, including interpersonal communication and decision-making in dynamic environments. OpenAI acknowledges these limitations and plans to expand GDPval to include more industries and interactive workflows in future versions.

Implications for Businesses and Professionals

According to OpenAI’s chief economist Dr. Aaron Chatterji, these results indicate that AI models like GPT-5 can help professionals automate routine tasks, freeing up time for more meaningful or higher-value work. This incremental progress points to a future where AI acts as a powerful assistant rather than an immediate replacement for human expertise.

The Importance of Real-World Benchmarks

As AI models begin to saturate traditional benchmarks—such as competitive math tests and PhD-level science questions—new evaluation tools like GDPval are becoming essential. These benchmarks aim to better capture an AI’s ability to perform tasks that matter in real business settings, offering more relevant insights for organizations considering AI adoption.

Looking Ahead

While AI models are not yet ready to fully take over human roles, the steady improvements measured by GDPval highlight how quickly the field is evolving. For businesses, staying informed about these advancements will be key to leveraging AI effectively and ethically in the workplace.