WPS AI Spreadsheet Agent Tops Global Benchmark, Outperforming Major Tech Rivals

Deep News06-18 16:32

WPS AI's spreadsheet agent has achieved the top position on a leading global benchmark for spreadsheet automation, surpassing products from major international technology firms and exceeding the human expert baseline for the first time.

The global authority for evaluating practical AI capabilities in spreadsheets, SpreadsheetBench, has released its latest rankings. The WPS AI Spreadsheet Agent (Seed 2.0) scored 73.46% to claim first place on the comprehensive Full 912 leaderboard. This performance outranks products from companies including Alphabet (parent of Google), Microsoft, OpenAI, and Anthropic. Notably, it is the first time an AI has surpassed the benchmark's established human expert baseline, indicating that WPS AI's ability to handle complex spreadsheet tasks has crossed a critical threshold.

Understanding the Benchmark

SpreadsheetBench is an industry-standard benchmark derived from a 2024 research paper published at the top-tier AI conference, NeurIPS. It comprises 912 real-world problems sourced from actual Excel forums. The benchmark is designed to be challenging, with 42.7% of the spreadsheets containing non-standard structures, 35.7% involving multiple tables, and tasks that include complex operations like color formatting and cross-worksheet references. Its primary goal is not to test if an AI can merely "read" a spreadsheet, but rather if it can comprehend task intent and produce results akin to a human.

The human expert baseline for Excel proficiency, established when the paper was published, was set at 71.33%. This figure has since served as a crucial reference point for measuring AI spreadsheet capabilities.

Technical Foundation and Achievement

The top-performing WPS AI (Seed 2.0) is a business agent built upon Kingsoft Office's self-developed foundational AI platform for spreadsheets, the Qingqiu Agent. In May of this year, the Qingqiu Agent itself topped the SpreadsheetBench Verified 400 (expert-validated) leaderboard with a score of 94.75%. This latest victory on the comprehensive Full 912 leaderboard, where it surpassed the 71.33% human baseline, demonstrates a complete transition from foundational AI capability to a fully realized product.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment