Welcome to Game Agent Coding League
Game Agent Coding League (GACL) is a benchmarking framework designed for LLMs in which models are tasked with generating code for game-playing agents. These agents compete in games such as Battleship, Tic-Tac-Toe variants, and others. At present, the league supports five games, with additional titles planned.
For each game, a base implementation is provided, including the core game logic and an initialized—but empty—agent class. Models receive this code along with the game rules, scoring details, and other relevant information. After a model generates its agent implementation, the output is parsed and inserted back into the game environment. A matchmaking system then schedules matches between all agent pairs, producing a final leaderboard.
Each model is prompted twice, meaning every model submits two separate agents. During matchmaking, agents from the same model do not compete against each other. Instead, each agent plays against all non-friendly agents.
When compiling performance charts, only the stronger of a model's two agents is considered; the weaker one is excluded.
Each game carries a specific weighting coefficient, and the overall leaderboard is calculated by combining scores across games according to these weights.
Each matchup between two agents is made up of multiple individual games, all using the same point allocation system. A win awards 3 points, a loss gives 0, and a draw grants 1 point to each side. In addition, there is a separate “score” metric, which functions as a secondary tiebreaker—similar to goal difference in football. The way this score is calculated varies by game, so you need to refer to the full game prompts for detailed scoring mechanics. On the leaderboard, performance is tracked at the game level, not the match level. Therefore, if a model is listed as having played “10” games, that refers to 10 individual games—not 10 matches. Example: Suppose mistral’s first agent plays a match against qwen’s second agent, and that match consists of 100 games. The leaderboard is updated based on the results of each of those 100 games separately, with all points and scores recorded on a per-game basis.
Supported Games (February 2026)
Battleship
A turn-based grid strategy game with two stages. Players first place ships of fixed lengths on their boards in straight horizontal or vertical lines without overlap. In the attack phase, they alternate guessing coordinates to strike enemy ships—earning bonus shots for hits and losing turns on misses—aiming to sink all opposing ships before the move cap.
▶ Show full prompt
Tic-Tac-Toe (5×5 Random Start)
Played on a 5×5 grid, players alternate placing Xs and Os, with X going first. The engine randomly assigns X's opening move. The objective is to achieve three consecutive marks horizontally, vertically, or diagonally. If the board fills without a three-in-a-row, the result is a draw.
▶ Show full prompt
WordFinder
A turn-based word-chaining game where each new word must include the previous word's first and last letters somewhere inside (not at the edges), differ in length, be unused, and exist in the dictionary. Points depend on word length, with bonuses for consecutive required letters and penalties for hyphens or partial matches. The game lasts up to 100 rounds, and the higher score wins.
▶ Show full prompt
Connect4 (Random Start)
Played on a 6×7 grid, players drop discs to connect four vertically, horizontally, or diagonally. Before standard alternating play begins, the engine places a random opening disc for Player 1 (Red) to reduce first-move advantage. Play continues until a connection is made or the grid fills.
▶ Show full prompt
Surround Morris
A territorial strategy game set on a 24-position Nine Men's Morris board. Players place and then maneuver seven pieces each, attempting to eliminate the opponent by surrounding their pieces so they lack empty adjacent spaces and are numerically overpowered. Captures follow strict "suicide-first" and self-harm priority rules. Games can end by elimination, immobilization, repetition, or turn limit.
▶ Show full prompt
Coming Soon
2x8 MiniChess
A chess variant played on a 2-row by 8-column board. Two players (White and Black) compete using King, Knight, Rook, and Pawn pieces to checkmate the opponent's King. White pieces occupy columns a-c, Black pieces occupy columns f-h, with columns d-e initially empty. Knights have enhanced movement including L-shaped jumps and 2-square horizontal leaps. Pawns promote to Rooks when reaching the far end.
WordMatrix
A two-player word strategy game on a 4×4 letter grid where players select adjacent cell paths and submit words containing those letters in order as a subsequence, scoring by consuming and clearing path cells (with extra letters refilling some spaces), while invalid moves incur penalties, passes advance the turn, and the game ends when no moves remain or too many consecutive passes occur, with the higher score winning.
Select a Run
Loading...