WPS AI's spreadsheet agent has achieved the top position on a leading global benchmark for spreadsheet automation, surpassing products from major international technology firms and exceeding the human expert baseline for the first time.
The global authority for evaluating practical AI capabilities in spreadsheets, SpreadsheetBench, has released its latest rankings. The WPS AI Spreadsheet Agent (Seed 2.0) scored 73.46% to claim first place on the comprehensive Full 912 leaderboard. This performance outranks products from companies including Alphabet (parent of Google), Microsoft, OpenAI, and Anthropic. Notably, it is the first time an AI has surpassed the benchmark's established human expert baseline, indicating that WPS AI's ability to handle complex spreadsheet tasks has crossed a critical threshold.
Understanding the Benchmark
SpreadsheetBench is an industry-standard benchmark derived from a 2024 research paper published at the top-tier AI conference, NeurIPS. It comprises 912 real-world problems sourced from actual Excel forums. The benchmark is designed to be challenging, with 42.7% of the spreadsheets containing non-standard structures, 35.7% involving multiple tables, and tasks that include complex operations like color formatting and cross-worksheet references. Its primary goal is not to test if an AI can merely "read" a spreadsheet, but rather if it can comprehend task intent and produce results akin to a human.
The human expert baseline for Excel proficiency, established when the paper was published, was set at 71.33%. This figure has since served as a crucial reference point for measuring AI spreadsheet capabilities.
Technical Foundation and Achievement
The top-performing WPS AI (Seed 2.0) is a business agent built upon Kingsoft Office's self-developed foundational AI platform for spreadsheets, the Qingqiu Agent. In May of this year, the Qingqiu Agent itself topped the SpreadsheetBench Verified 400 (expert-validated) leaderboard with a score of 94.75%. This latest victory on the comprehensive Full 912 leaderboard, where it surpassed the 71.33% human baseline, demonstrates a complete transition from foundational AI capability to a fully realized product.
Comments