Overall Performance (Games have different weights)
To access raw scoreboards before post-processing, agent codes generated by the models, and
game logs, click here.
Overall Leaderboard
Battleship Performance
More information about the games, point system and full prompts can be found in Games
page.
Do you think one of the models performed significantly worse than expected in a particular
game? You can find more details about this here.
Leaderboard
Tic-Tac-Toe Performance
More information about the games, point system and full prompts can be found in Games
page.
Do you think one of the models performed significantly worse than expected in a particular
game? You can find more details about this here.
Leaderboard
Wizard Performance
More information about the games, point system and full prompts can be found in Games
page.
Do you think one of the models performed significantly worse than expected in a particular
game? You can find more details about this here.
Leaderboard
Connect4 Performance
More information about the games, point system and full prompts can be found in Games
page.
Do you think one of the models performed significantly worse than expected in a particular
game? You can find more details about this here.
Leaderboard
WordMatrix Performance
More information about the games, point system and full prompts can be found in Games
page.
Do you think one of the models performed significantly worse than expected in a particular
game? You can find more details about this here.
Leaderboard
MiniChess Performance
More information about the games, point system and full prompts can be found in Games
page.
Do you think one of the models performed significantly worse than expected in a particular
game? You can find more details about this here.
Leaderboard
Surround Morris Performance
More information about the games, point system and full prompts can be found in Games
page.
Do you think one of the models performed significantly worse than expected in a particular
game? You can find more details about this here.