{"id":6980,"date":"2025-09-05T08:22:31","date_gmt":"2025-09-05T13:22:31","guid":{"rendered":"https:\/\/lab.rivas.ai\/?p=6980"},"modified":"2025-09-05T08:22:31","modified_gmt":"2025-09-05T13:22:31","slug":"latent%e2%80%91space-chess-planning-with-supervised-contrastive-learning-achieves-2593-elo","status":"publish","type":"post","link":"https:\/\/lab.rivas.ai\/?p=6980","title":{"rendered":"Latent\u2011Space Chess Planning with Supervised Contrastive Learning Achieves 2593\u202fElo"},"content":{"rendered":"\n\n\n<header>\n<p class=\"meta-description\">We train a transformer encoder with supervised contrastive learning so that a 6\u2011ply beam search reaches 2593\u202fElo, rivaling Stockfish with far less computation.<\/p>\n<p class=\"deck\">We embed chess positions into a continuous space where distance mirrors evaluation. By moving toward an \u201cadvantage vector\u201d in that space, our engine plans moves without deep tree search, delivering super\u2011human strength with a tiny search budget.<\/p>\n<\/header><nav class=\"toc\">\n<ul>\n<li><a href=\"#tldr\">TL;DR<\/a><\/li>\n<li><a href=\"#why-it-matters\">Why it matters<\/a><\/li>\n<li><a href=\"#how-it-works\">How it works (plain words)<\/a><\/li>\n<li><a href=\"#results\">What we found<\/a><\/li>\n<li><a href=\"#equation\">Key equation<\/a><\/li>\n<li><a href=\"#limits\">Limits and next steps<\/a><\/li>\n<li><a href=\"#faq\">FAQ<\/a><\/li>\n<li><a href=\"#read-the-paper\">Read the paper<\/a><\/li>\n<\/ul>\n<\/nav>\n<section id=\"tldr\">\n<h2>TL;DR<\/h2>\n<ul>\n<li>We replace deep tree search with planning in a learned latent space.<\/li>\n<li>Our engine reaches an estimated 2593\u202fElo using only a 6\u2011ply beam search.<\/li>\n<li>The approach is efficient, interpretable, and scales with model size.<\/li>\n<\/ul>\n<\/section>\n<section id=\"why-it-matters\">\n<h2>Why it matters<\/h2>\n<p>Traditional chess engines such as Stockfish rely on exhaustive tree search that explores millions of positions and requires heavy hardware. Human grandmasters, by contrast, use intuition to prune the search space and then look ahead only a few moves. Replicating that human\u2011like intuition in an AI system could dramatically reduce the computational cost of strong play and make powerful chess agents accessible on modest devices. Moreover, a method that plans by moving through a learned representation is potentially transferable to any domain where a sensible state evaluation exists\u2014games, robotics, or decision\u2011making problems.<\/p>\n<\/section>\n<section id=\"how-it-works\">\n<h2>How it works (plain words)<\/h2>\n<p>Our pipeline consists of three intuitive steps.<\/p>\n<ol>\n<li><strong>Learning the space.<\/strong> We train a transformer encoder on five million positions taken from the ChessBench dataset. Each position carries a Stockfish win\u2011probability. Using supervised contrastive learning, the model pulls together positions with similar probabilities and pushes apart those with different probabilities. The result is a high\u2011dimensional embedding where \u201cnearby\u201d boards have similar evaluations.<\/li>\n<li><strong>Defining an advantage direction.<\/strong> From the same training data we isolate extreme states: positions that Stockfish rates as forced checkmate for White (probability\u202f=\u202f1.0) and for Black (probability\u202f=\u202f0.0). We compute the mean embedding of each extreme set and subtract them. The resulting vector points from Black\u2011winning regions toward White\u2011winning regions and serves as our \u201cadvantage axis.\u201d<\/li>\n<li><strong>Embedding\u2011guided beam search.<\/strong> At run time we enumerate all legal moves, embed each resulting board, and measure its cosine similarity to the advantage axis. The top\u2011k (k\u202f=\u202f3) most aligned positions are kept and expanded recursively up to six plies. Because the score is purely geometric, the engine prefers moves that point in the direction of higher evaluation, effectively \u201cwalking\u201d toward better regions of the space.<\/li>\n<\/ol>\n<p>The entire process requires no hand\u2011crafted evaluation function and no recursive minimax or Monte\u2011Carlo tree search. Planning becomes a matter of geometric reasoning inside the embedding.<\/p>\n<\/section>\n<section id=\"results\">\n<h2>What we found<\/h2>\n<h3>Elo performance<\/h3>\n<p>We evaluated two architectures:<\/p>\n<ul>\n<li><strong>Base model.<\/strong> 400\u202fK training steps, 768\u2011dimensional embeddings, beam width\u202f=\u202f3.<\/li>\n<li><strong>Small model.<\/strong> Same training regime but with fewer layers and a 512\u2011dimensional embedding.<\/li>\n<\/ul>\n<p>When we increase the search depth from 2 to 6 plies, the Base model\u2019s estimated Elo improves steadily: 2115\u202f(2\u2011ply), 2318\u202f(3\u2011ply), 2433\u202f(4\u2011ply), 2538\u202f(5\u2011ply), and 2593\u202f(6\u2011ply). The Small model follows the same trend but stays roughly 30\u201350 points behind at every depth. The 2593\u202fElo estimate at depth\u202f6 is comparable to Stockfish\u202f16 running at a calibrated 2600\u202fElo, yet our engine performs the search on a single GPU in a fraction of the time.<\/p>\n<h3>Scaling behaviour<\/h3>\n<p>Both model size and embedding dimensionality contribute positively. Larger transformers (the Base configuration) consistently outperform the Small configuration, confirming that richer representations give the planner better navigation cues. Early experiments with higher\u2011dimensional embeddings (e.g., 1024\u202fD) show modest additional gains, suggesting a ceiling that will likely rise with even bigger models.<\/p>\n<h3>Qualitative insights<\/h3>\n<p>We visualized thousands of positions using UMAP. The plot reveals a clear gradient: clusters of White\u2011advantage positions sit on one side, Black\u2011advantage positions on the opposite side, and balanced positions cluster near the origin. When we trace the embeddings of actual games, winning games trace smooth curves that move from the centre toward the appropriate advantage side, while tightly contested games jitter around the centre. These trajectories give us a visual proof that the embedding captures strategic progress without any explicit evaluation function.<\/p>\n<h3>Interpretability<\/h3>\n<p>Because move choice is a cosine similarity score, we can inspect why a move was preferred. For any position we can project its embedding onto the advantage axis and see whether the engine is pushing toward White\u2011dominant or Black\u2011dominant regions. This geometric view is far more transparent than a black\u2011box evaluation network that outputs a scalar score.<\/p>\n<\/section>\n<section id=\"equation\">\n<h2>Key equation<\/h2>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 58px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lab.rivas.ai\/wp-content\/ql-cache\/quicklatex.com-05cc21b1de9e0f16c724022639be55b5_l3.png\" height=\"58\" width=\"374\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#92;&#91; &#76;&#32;&#61;&#32;&#45;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#61;&#49;&#125;&#94;&#123;&#78;&#125;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#124;&#80;&#40;&#105;&#41;&#124;&#125;&#92;&#115;&#117;&#109;&#95;&#123;&#112;&#92;&#105;&#110;&#32;&#80;&#40;&#105;&#41;&#125;&#92;&#108;&#111;&#103;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#101;&#120;&#112;&#40;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#122;&#125;&#95;&#105;&#92;&#99;&#100;&#111;&#116;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#122;&#125;&#95;&#112;&#47;&#92;&#116;&#97;&#117;&#41;&#125;&#123;&#92;&#115;&#117;&#109;&#95;&#123;&#97;&#92;&#105;&#110;&#32;&#65;&#40;&#105;&#41;&#125;&#92;&#101;&#120;&#112;&#40;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#122;&#125;&#95;&#105;&#92;&#99;&#100;&#111;&#116;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#122;&#125;&#95;&#97;&#47;&#92;&#116;&#97;&#117;&#41;&#125; &#92;&#93;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Here, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lab.rivas.ai\/wp-content\/ql-cache\/quicklatex.com-42ed7b39efdbe6540eaba3d0a3137555_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#122;&#125;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"14\" style=\"vertical-align: -3px;\"\/> is the embedding of the i\u2011th board state, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lab.rivas.ai\/wp-content\/ql-cache\/quicklatex.com-9b28ce1e2f2d876fca627a6952a74cf0_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#80;&#40;&#105;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"33\" style=\"vertical-align: -5px;\"\/> denotes the set of positives (positions whose Stockfish evaluations differ by less than the margin\u202f=\u202f0.05), <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lab.rivas.ai\/wp-content\/ql-cache\/quicklatex.com-73d89b50abb73239fbaa21dd3c4c4dd4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#65;&#40;&#105;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"32\" style=\"vertical-align: -5px;\"\/> is the full batch, and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lab.rivas.ai\/wp-content\/ql-cache\/quicklatex.com-13197f4653c1fd428a291609eb1e3b87_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#116;&#97;&#117;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> is the temperature parameter. This supervised contrastive loss pulls together positions with similar evaluations and pushes apart those with dissimilar evaluations, shaping the latent space for geometric planning.<\/p>\n<\/section>\n<section id=\"limits\">\n<h2>Limits and next steps<\/h2>\n<h3>Current limitations<\/h3>\n<ul>\n<li><strong>Greedy beam search.<\/strong> With a beam width of three, the search cannot revise early commitments. Long\u2011term tactical ideas that require a temporary sacrifice can be missed.<\/li>\n<li><strong>Training target dependence.<\/strong> Our contrastive objective uses Stockfish evaluations as ground truth. While this provides high\u2011quality numerical signals, it may not capture the nuanced strategic preferences of human players.<\/li>\n<\/ul>\n<h3>Future directions<\/h3>\n<ul>\n<li>Replace the greedy beam with more exploratory strategies such as wider or non\u2011greedy beams, Monte\u202fCarlo rollouts, or hybrid search that combines latent scoring with occasional shallow alpha\u2011beta pruning.<\/li>\n<li>Fine\u2011tune the embedding with reinforcement learning, allowing the engine to discover its own evaluation signal from self\u2011play rather than relying solely on Stockfish.<\/li>\n<li>Scale the transformer to larger depth and width, and enrich the positive\u2011pair sampling (e.g., include mid\u2011game strategic motifs) to sharpen the advantage axis.<\/li>\n<li>Apply the same representation\u2011based planning to other perfect\u2011information games (Go, Shogi, Hex) where a numeric evaluation can be generated.<\/li>\n<\/ul>\n<\/section>\n<section id=\"faq\">\n<h2>FAQ<\/h2>\n<dl>\n<dt>What is \u201clatent\u2011space planning\u201d?<\/dt>\n<dd>It is the idea that an agent can decide which action to take by moving its internal representation toward a region associated with higher value, instead of exploring a combinatorial tree of future states.<\/dd>\n<dt>Why use supervised contrastive learning instead of ordinary regression?<\/dt>\n<dd>Contrastive learning directly shapes the geometry of the space: positions with similar evaluations become neighbours, while dissimilar positions are pushed apart. This geometric structure is essential for the cosine\u2011similarity scoring used in our search.<\/dd>\n<dt>How does the \u201cadvantage vector\u201d get computed?<\/dt>\n<dd>We take the mean embedding of forced\u2011checkmate positions for White (p\u202f=\u202f1.0) and the mean embedding of forced\u2011checkmate positions for Black (p\u202f=\u202f0.0) and subtract the latter from the former. The resulting vector points from Black\u2011winning regions toward White\u2011winning regions.<\/dd>\n<dt>Can this method replace Monte\u2011Carlo Tree Search (MCTS) in AlphaZero\u2011style agents?<\/dt>\n<dd>Our results show that, for chess, a well\u2011structured latent space can achieve comparable strength with far shallower search. Whether it can fully replace MCTS in other domains remains an open research question, but the principle of geometric planning is compatible with hybrid designs that still retain some tree\u2011based refinement.<\/dd>\n<dt>Is the engine limited to Stockfish\u2011derived data?<\/dt>\n<dd>In its current form, yes; we use Stockfish win\u2011probabilities as supervision. Future work plans to incorporate human annotations or self\u2011play reinforcement signals to reduce this dependency.<\/dd>\n<\/dl>\n<\/section>\n<section id=\"read-the-paper\">\n<h2>Read the paper<\/h2>\n<p>For a complete technical description, training details, and additional visualizations, see our full paper:<\/p>\n<p><a href=\"https:\/\/www.rivas.ai\/pdfs\/hamara2025learning.pdf\">Learning to Plan via Supervised Contrastive Learning and Strategic Interpolation: A Chess Case Study<\/a><\/p>\n<p>If you prefer a direct download, the PDF is available here: <a href=\"https:\/\/www.rivas.ai\/pdfs\/hamara2025learning.pdf\">Download PDF<\/a><\/p>\n<\/section>\n<section id=\"citation\">\n<h2>Reference<\/h2>\n<p>Hamara, A., Hamerly, G., Rivas, P., &amp; Freeman, A. C. (2025). Learning to plan via supervised contrastive learning and strategic interpolation: A chess case study. In <em>Proceedings of the Second Workshop on Game AI Algorithms and Multi\u2011Agent Learning (GAAMAL) at IJCAI 2025<\/em> (pp. 1\u20137). Montreal, Canada.<\/p>\n<\/section>\n<\/section>\n\n","protected":false},"excerpt":{"rendered":"<p>We train a transformer encoder with supervised contrastive learning so that a 6\u2011ply beam search reaches 2593\u202fElo, rivaling Stockfish with far less computation.<br \/>\nWe embed chess positions into a continuous space where distance mirrors evaluation. By moving toward an \u201cadvantage vector\u201d in that space, our engine plans moves without deep tree search, delivering super\u2011human strength with a tiny search budget.<\/p>\n","protected":false},"author":11,"featured_media":6979,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[8],"class_list":["post-6980","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-representation-learning"],"jetpack_featured_media_url":"https:\/\/lab.rivas.ai\/wp-content\/uploads\/2025\/09\/8JuXN-cover.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts\/6980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=6980"}],"version-history":[{"count":8,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts\/6980\/revisions"}],"predecessor-version":[{"id":6990,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/posts\/6980\/revisions\/6990"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=\/wp\/v2\/media\/6979"}],"wp:attachment":[{"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=6980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=6980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lab.rivas.ai\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=6980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}