Policy or Value ? Loss Function and Playing Strength in AlphaZero
Por um escritor misterioso
Last updated 16 abril 2025

Results indicate that, at least for relatively simple games such as 6x6 Othello and Connect Four, optimizing the sum, as AlphaZero does, performs consistently worse than other objectives, in particular by optimizing only the value loss. Recently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a Deep Q-network, that is trained using self-play. The unified Deep Q-network has a policy-head and a value-head. In AlphaZero, during training, the optimization minimizes the sum of the policy loss and the value loss. However, it is not clear if and under which circumstances other formulations of the objective function are better. Therefore, in this paper, we perform experiments with combinations of these two optimization targets. Self-play is a computationally intensive method. By using small games, we are able to perform multiple test cases. We use a light-weight open source reimplementation of AlphaZero on two different games. We investigate optimizing the two targets independently, and also try different combinations (sum and product). Our results indicate that, at least for relatively simple games such as 6x6 Othello and Connect Four, optimizing the sum, as AlphaZero does, performs consistently worse than other objectives, in particular by optimizing only the value loss. Moreover, we find that care must be taken in computing the playing strength. Tournament Elo ratings differ from training Elo ratings—training Elo ratings, though cheap to compute and frequently reported, can be misleading and may lead to bias. It is currently not clear how these results transfer to more complex games and if there is a phase transition between our setting and the AlphaZero application to Go where the sum is seemingly the better choice.

Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers – arXiv Vanity

Acquisition of Chess Knowledge in AlphaZero – arXiv Vanity

RankNet for evaluation functions of the game of Go - IOS Press
Lecture 13: Reinforcement learning

AlphaZero, Vladimir Kramnik and reinventing chess

Frontiers AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

Value targets in off-policy AlphaZero: a new greedy backup

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Shogi and Go through Self-Play

Reimagining Chess with AlphaZero, February 2022

Multiplayer AlphaZero – arXiv Vanity

Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers – arXiv Vanity

AlphaZero from scratch in PyTorch for the game of Chain Reaction — Part 3, by Bentou

reference request - How do neural networks play chess? - Artificial Intelligence Stack Exchange

🔵 AlphaZero Plays Connect 4

Policy or Value ? Loss Function and Playing Strength in AlphaZero-like Self- play
Recomendado para você
-
New AlphaZero Paper Explores Chess Variants16 abril 2025
-
Alphazero Chess Download PNG - Google-Keresés16 abril 2025
-
Revista de Xadrez New In Chess 2019-8 Magnus Carlsen Observe as Fotos16 abril 2025
-
R] Understanding AlphaZero Neural Network's SuperHuman Chess Ability (Summary of the Paper 'Acquisition of Chess Knowledge in AlphaZero') : r/MachineLearning16 abril 2025
-
AI Summary: Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search16 abril 2025
-
Cammy street fighter alpha/ zero 3 Greeting Card by watolo16 abril 2025
-
David Silver (et al.), A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. With: Garry Kasparov, Chess, a Drosophila of Reasoning. And with: Murray Campbell, Mastering Board games16 abril 2025
-
Zero-Alpha. NZ Police Armed Offenders Squad Official History. By Ray V – Phoenix Books NZ16 abril 2025
-
Alpha Kappa Alpha Sorority, Incorporated - Rho Xi Omega Chapter16 abril 2025
-
Move over AlphaGo: AlphaZero taught itself to play three different16 abril 2025
você pode gostar
-
What nationality/race is Dwayne 'The Rock' Johnson? - Quora16 abril 2025
-
Pokemon Floresta Da Vida Estrela Prisma Sol E Lua Trovões Pe16 abril 2025
-
If you're not careful and you noclip out of reality in the wrong16 abril 2025
-
Oshi no Ko episode 4 #Oshinoko #anime #animespoiler16 abril 2025
-
My Isekai Life 08: I Gained A Second Character Class And Became16 abril 2025
-
Nike, Pro Men's Tight Fit Short-Sleeve Top, Baselayer Tops16 abril 2025
-
Minecraft Pocket Edition para Android - Descargar16 abril 2025
-
IBM and the Holocaust: The Strategic Alliance Between Nazi Germany and America's Most Powerful Corporation-Expanded Edition eBook : Black, Edwin: Kindle Store16 abril 2025
-
Jogo Condensado, Corinthians x Flamengo, Fase de Classificação16 abril 2025
-
RARE Oakley Flak Jacket XLJ Detroit Tigers 24-093 Made in USA MLB with Box16 abril 2025