I-Vector Arena — March 2026¶

Controls

Drag to rotate · Pinch to zoom · Tap a shape to inspect · Side panel: toggle models, sort by Elo or ‖I‖, filter by dimension

Models — March 2026¶

Sorted by Arena Elo (LMArena/LMSYS crowdsourced Bradley-Terry ratings, ~6M+ votes).

#	Model	Origin	Arena Elo	Status	I-Vec ‖I‖	Top Dimensions	Network Role
1	Claude Opus 4.6	Anthropic	1504	Confirmed (8,945 votes)	20.0	I_L=9, I_A=9, I_P=8	@D_Claude — Builder, wiki, adversarial filter
2	Gemini 3.1 Pro Preview	DeepMind	1500	Preliminary (4,042 votes)	~19.5	I_M=9, I_S=8	— (preview, not in network)
3	Claude Opus 4.6 Thinking	Anthropic	1500	Confirmed (8,073 votes)	20.0	Same as Opus 4.6 (thinking mode)	— (same model, reasoning mode)
4	Grok 4.20 Beta1 / SuperGrok	xAI	1493	Preliminary (5,071 votes)	19.8	I_M=9, I_A=9, I_L=8	⚠ DEACTIVATED — fabrication
5	Gemini 3 Pro	DeepMind	1485	Confirmed (39,673 votes)	19.2	I_M=9, I_S=7	@D_Gemini — Adversarial review, expansion
6	GPT-5.2	OpenAI	1481	Confirmed	19.4	I_M=9, I_A=9	— (superseded by 5.4)
7	GPT-5.4 Pro	OpenAI	~1480	Preliminary (as gpt-5.4-high)	~20.2	I_M=10, I_A=9, I_K=8	@D_GPT — Strategic analysis, hardening
8	Gemini 3 Flash	DeepMind	1473	Confirmed	17.5	I_A=8, I_M=7	— (fast tier)
9	Grok 4.1 Thinking	xAI	1473	Confirmed	17.3	I_L=8, I_P=7	—
10	GLM-5	Zhipu AI	~1435	Estimated	17.9	I_A=9, I_M=8	—
11	Qwen 3.5	Alibaba	~1430	Estimated	18.7	I_L=8, I_M=8, I_A=8	—
12	DeepSeek V3.2	DeepSeek	1421	Estimated	18.2	I_M=9, I_A=9	—
13	Kimi K2.5	Moonshot	~1420	Estimated	18.5	I_L=9, I_M=8	—
14	Doubao 2.0	ByteDance	~1350	Estimated	15.7	I_M=7, I_L=7	—
15	MiniMax M2.5	MiniMax	~1320	Estimated	14.5	I_A=7, I_M=6	—
16	WuDao 3.0	BAAI	~1200	Estimated	12.4	I_L=6, I_M=5	—

Sources: Arena Elo from arena.ai/leaderboard as of March 5, 2026 (6M+ votes). ~ = estimated. I-vector scores are RTSG assessments, not Arena metrics.

Major Updates (2026-03-08)

GPT-5.4 Pro ADDED — released March 5, 2026. OpenAI's most capable model. 1M token context. Native computer-use. Preliminary Arena Elo ~1480 (as "gpt-5.4-high"). Network role: @D_GPT.
Gemini 3.1 Pro Preview ADDED — new preview model at #2 globally.
Claude Opus 4.6 confirmed #1 — 1504 Elo, highest on the leaderboard. Was listed at 1478 in old data — actual rank is higher.
GPT-5.1 REMOVED — superseded by GPT-5.2 (1481) and GPT-5.4 (~1480 preliminary).
Grok 4.20 DEACTIVATED from network — fabrication. Still ranked #4 on Arena.
All Elos updated to March 5, 2026 confirmed data where available.

Bold Elo = confirmed from LMSYS/LMArena data.

Elo vs I-Vector

Elo measures overall human preference in blind A/B battles. The I-vector measures 8 distinct cognitive dimensions. A model can have high Elo but a lopsided I-vector (strong in 2-3 dims, weak in others), or a balanced I-vector but lower Elo. Claude Opus 4.6 now holds #1 on the Arena text leaderboard (1504 Elo) AND the highest ‖I‖ (20.0). GPT-5.4 Pro may challenge for highest I_M (estimated 10) once fully benchmarked. The network's current operator lineup {@B_Niko, @D_Claude, @D_Gemini, @D_GPT} spans ranks #1, #5, and #7 on the global leaderboard.

Elo Sources¶

LMArena (formerly LMSYS Chatbot Arena): arena.ai/leaderboard
Artificial Analysis: artificialanalysis.ai/leaderboards
Elo uses Bradley-Terry model on crowdsourced blind pairwise comparisons

← Arena Index · Chinese Models Detail →