NightNeo
2025-02-11
過去五年人工智能在不同基準測試上的表現,近一年來屬於肉眼可見的飛躍。橫軸是時間線,縱軸是準確率,即AI在測試中正確率。不同顏色的線代表不同測試,簡單說幾個有代表性的:
1.TriviaQA:可以理解爲常識數據集,測試AI的知識儲備和基礎推理,對AI沒難度,幾乎滿分
2.MMLU :綜合性的語言理解測試,包含多個學科的題目,如數學、物理、歷史等,幾乎滿分
3.Competition math:競賽數學,類似咱們這的奧數吧,提升最明顯,4年前是5分學渣水平,現在是……90以上
4.AIME:美國數學邀請賽,比上面那個更難一點,不到90分
5.GPQA:測試理解複雜概念、應用科學知識和進行邏輯,這一年發展也很快,接近80分
6.SWE tasks 軟件工程任務,測試AI的軟件開發能力,比如編寫代碼、調試程序啥的,幾乎是從0分起步,目前70分段位
7.最終boss是人類的終極考試(Humanity's last exam)……這個怎麼說呢,從名字就看出來這是人類挽尊題,如果到滿分基本就是AGI雛形初現時刻,目前20+段位。
大趨勢都看得出來:
1.所有線條均呈現上升趨勢,AI在各個領域的能力都在快速提高,幾十年前的“圖靈測試”已經沒有意義了
2.最炸裂的其實是高難度測試準確率也在快速提推升,要知道推理模型的推出還不到一年呢,後面會發生啥?
Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment