<div class="notebook">
<div class="nb-cell markdown" name="md1">
# Game Trees
_Robert Laing_
This tutorial has now been appended to <https://swish.swi-prolog.org/p/Graphs1.swinb>, partly because I've managed to get the same code base to handle and games and puzzles, and because for some reason I can't keep the same URL when I save with this file for some reason.
Please post any suggestions and comments to a discussion I started at <https://swi-prolog.discourse.group/t/game-tree-tutorial-on-swish/921>.
As with puzzles, I'm using conventions from [Stanford University's General Game Playing](http://ggp.stanford.edu/public/notes.php) course, translating rules written in [Game Description Language](http://ggp.stanford.edu/notes/chapter_02.html) into Prolog.
This is an initial stab at advancing from depth first search to [iterative deepening](https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search).
I've kept the historical names -- [minimax](https://en.wikipedia.org/wiki/Minimax_theorem) and [alpha-beta](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning) -- though they assume two player games, when the same ideas can be applied from one player games (commonly called puzzles) to N player games.
I've tried to keep the code similar to the graph traversal tutorial, but one key change I've made is instead of storing the game tree as a list of move(Parent, Label, Child) clauses, I've expanded this to move(RewardsList, Parent, Label, Child).
As far as I understand the jargon term [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming), this is an example of it in that it involves lots of updating the values of nodes, something Prolog's _assign-once variables_ make fairly laborious, I hope to gradually polish the spaghetti code I've written below do this into something more elegant.
</div>
<div class="nb-cell program" data-background="true" name="p1">
%%% General Game Player library
% move(Value, Parent, Label, Child).
%% solve_dfs(-Moves) is nondet
% Returns a list of does(Player, Move) clauses in sequence to
% solve the puzzle using depth first search
solve_dfs(Moves) :-
findinit(Start),
heuristic(Start, Rewards), % Assuming start not terminal
depthfirst([move(Rewards, start, noop, Start)], [], Tree),
start_to_end(Tree, Moves), !.
depthfirst([], Tree, Tree).
depthfirst([move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
( visited(Acc, move(Rewards, Parent, Label, Child))
-> ( member(move(Rewards, Parent, Label, Child), Acc) % Can different path to existing node
-> AccOut = Acc
; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut)
),
NewFrontier = Frontier
; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut),
( alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child))
-> generate_children(Child, Unsorted),
sort_generated_children(Unsorted, Children),
append(Children, Frontier, NewFrontier)
; NewFrontier = Frontier
)
),
depthfirst(NewFrontier, AccOut, Tree).
%% solve_bfs(-Moves) is nondet
% Returns a list of does(Player, Move) clauses in sequence to
% solve the puzzle using breadth first search
solve_bfs(Moves) :-
findinit(Start),
heuristic(Start, Rewards), % Assuming start not terminal
breadthfirst([move(Rewards, start, noop, Start)], [], Tree),
start_to_end(Tree, Moves), !.
breadthfirst([], Tree, Tree).
breadthfirst([move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
( visited(Acc, move(Rewards, Parent, Label, Child))
-> ( member(move(Rewards, Parent, Label, Child), Acc) % Can different path to existing node
-> AccOut = Acc
; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut)
),
NewFrontier = Frontier
; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut),
( alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child))
-> generate_children(Child, Unsorted),
sort_generated_children(Unsorted, Children),
append(Frontier, Children, NewFrontier)
; NewFrontier = Frontier
)
),
breadthfirst(NewFrontier, AccOut, Tree).
%% solve_ids(-Moves) is nondet
% Uses iterative deepening to solve larger game trees
solve_ids(Moves) :-
findinit(Start),
member(step(Limit), Start),
heuristic(Start, Rewards), % Assuming start not terminal
iterative_deepening(Limit, [move(Rewards, start, noop, Start)], [], Tree),
start_to_end(Tree, Moves), !.
iterative_deepening(Limit, [move(AlphaBeta, start, noop, Start)], TreeIn, Tree) :-
( member(control(Player), Start) ; role(Player) ),
( member(goal(Player, 100), AlphaBeta)
-> Tree = TreeIn % Puzzle solved or game won
; LimitInc is Limit + 1,
depthlimited(LimitInc, [move(AlphaBeta, start, noop, Start)], TreeIn, TreeOut),
( TreeIn == TreeOut % equilibrium reached, pointless continuing
-> Tree = TreeOut
; member(move(NewAlphaBeta, start, noop, Start), TreeOut),
iterative_deepening(LimitInc, [move(NewAlphaBeta, start, noop, Start)], TreeOut, Tree)
)
).
depthlimited(_, [], Tree, Tree).
% Limit reached, no new nodes added to Frontier
depthlimited(Maxdepth, [move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
member(step(Maxdepth), Child), !,
( visited(Acc, move(Rewards, Parent, Label, Child))
-> AccOut = Acc
; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut)
),
depthlimited(Maxdepth, Frontier, AccOut, Tree).
% Either fresh child nodes need to be generated or old ones found in tree
depthlimited(Maxdepth, [move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
member(step(N), Child),
N < Maxdepth,
( visited(Acc, move(Rewards, Parent, Label, Child))
-> AccOut = Acc,
include(children_filter(move(Rewards, Parent, Label, Child)), Acc, Unpruned),
( length(Unpruned, 0)
-> generate_children(Child, Unsorted),
sort_generated_children(Unsorted, Children)
; include(alpha_beta_test(Acc), Unpruned, Unsorted),
sort_old_children(Unsorted, Children)
),
append(Children, Frontier, NewFrontier)
; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut),
( alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child))
-> generate_children(Child, Unsorted),
sort_generated_children(Unsorted, Children),
append(Children, Frontier, NewFrontier)
; NewFrontier = Frontier
)
),
depthlimited(Maxdepth, NewFrontier, AccOut, Tree).
%% Common predicates
update_state(State) :-
retractall(true(_)),
forall(member(Proposition, State), assertz(true(Proposition))).
findinit(Start) :-
setof(Init, init(Init), Start).
getnext(Parent, does(Player, Move), child(Parent, does(Player, Move), Child)) :-
retractall(does(_,_)),
assertz(does(Player, Move)),
setof(Next, next(Next), Child).
generate_children(Parent, Children) :-
update_state(Parent),
( \+terminal
-> ( true(control(Player)) ; role(Player) ),
( setof(does(Player, Move), legal(Player, Move), Legals)
-> maplist(getnext(Parent), Legals, Children)
; Children = []
)
; Children = []
).
sort_generated_children([], []).
sort_generated_children(ChildrenIn, ChildrenOut) :-
value_generated_children(ChildrenIn, WithZeros),
exclude(zeros, WithZeros, Unsorted), % Assume only losing terminals have zero
sort(1, >=, Unsorted, Sorted),
maplist(strip_value, Sorted, ChildrenOut).
value_generated_children([], []).
value_generated_children([child(Parent, Label, Child)|ChildrenIn], [vm(Value, move(Rewards, Parent, Label, Child))|ChildrenOut]) :-
update_state(Child),
( \+terminal
-> heuristic(Child, Rewards) % Need to add heuristic to parent reward value to ensure monotonicity
; setof(goal(Role, Reward), goal(Role, Reward), Rewards)
),
( member(control(Player), Parent) ; role(Player) ),
member(goal(Player, Value), Rewards),
value_generated_children(ChildrenIn, ChildrenOut).
%% sort_old_children(ChildrenIn, ChildrenOut) is det
% Used with iterative deepening to put nodes from the tree in order before stacking on Frontier
sort_old_children([], []).
sort_old_children([move(Rewards, Parent, Label, Child)|Moves], Children) :-
( member(control(Player), Parent) ; role(Player) ),
maplist(player_value(Player), [move(Rewards, Parent, Label, Child)|Moves], Unsorted),
sort(1, >=, Unsorted, Sorted),
maplist(strip_value, Sorted, Children).
player_value(Player, move(Rewards, Parent, Label, Child), vm(Value, move(Rewards, Parent, Label, Child))) :-
member(goal(Player, Value), Rewards).
zeros(vm(0, _)).
strip_value(vm(_, Move), Move).
%% alpha_beta_test(+NewNode, +Tree) is det
% True if node is worth expanding
% Essentially only looks at zeros and hundreds since heuristic values too flakey
alpha_beta_test(Tree, move(Rewards, Parent, _Label, _Child)) :-
( member(control(Player), Parent) ; role(Player) ),
member(goal(Player, Value), Rewards),
Value > 0, % Prune 0 value losing moves ruthlessly
member(move(AlphaBeta, start, noop, _), Tree),
member(goal(Player, CurrentBest), AlphaBeta),
( CurrentBest == 100
-> Value == 100 % Only keep winning moves if one has already been found
; Fudged is Value * 4, % Leave wide margin for heuristic error
Fudged > CurrentBest
).
%% backward_induction(+NodeList, +Tree, -UpdatedTree) is det
backward_induction([], Tree, Tree).
backward_induction([move(NewRewards, Parent, Label, Child)|Parents], TreeIn, Tree) :-
( member(control(Player), Parent) ; role(Player) ),
member(goal(Player, NewValue), NewRewards),
include(parent_filter(move(NewRewards, Parent, Label, Child)), TreeIn, Grandparents),
value_grandparents(Player, Grandparents, TreeIn, 0, OldValue),
( NewValue >= OldValue
-> maplist(replace_reward(NewRewards), Grandparents, Revalued),
revalue_tree(Revalued, TreeIn, TreeOut),
append(Parents, Revalued, Ancestors)
; TreeOut = TreeIn,
Ancestors = Parents
),
backward_induction(Ancestors, TreeOut, Tree).
value_grandparents(_, [], _, Max, Max).
value_grandparents(Player, [move(OldRewards, Grandparent, PrevLabel, Parent)|Grandparents], Tree, AccIn, Max) :-
include(children_filter(move(OldRewards, Grandparent, PrevLabel, Parent)), Tree, Children),
maplist(player_reward(Player), Children, PlayerRewards),
max_list(PlayerRewards, Best),
( Best > AccIn
-> AccOut = Best
; AccOut = AccIn
),
value_grandparents(Player, Grandparents, Tree, AccOut, Max).
revalue_tree([], Tree, Tree).
revalue_tree([move(NewRewards, Parent, Label, Child)|Moves], TreeIn, Tree) :-
nth1(Idx, TreeIn, move(_OldRewards, Parent, Label, Child), Rest),
nth1(Idx, TreeOut, move(NewRewards, Parent, Label, Child), Rest),
revalue_tree(Moves, TreeOut, Tree).
% Predicats used with maplist(closure, ListIn, ListOut)
player_reward(Player, move(Rewards, _, _, _), PlayerReward) :-
member(goal(Player, PlayerReward), Rewards).
replace_reward(NewReward, move(_, Player, Label, Child), move(NewReward, Player, Label, Child)).
% Predicates used with include(filter, ListIn, ListOut)
parent_filter(move(_, Parent, _, _), move(_, _, _, Parent)).
children_filter(move(_, _, _, Child), move(_, Child, _, _)).
start_to_end(TreeIn, Path) :-
select(move(AlphaBeta, start, noop, Start), TreeIn, TreeOut),
start_to_end_(TreeOut, [move(AlphaBeta, start, noop, Start)], RevMoves),
maplist(extractmoves, RevMoves, RevPath),
reverse(RevPath, [noop|Path]).
start_to_end_(TreeIn, [move(AlphaBeta, Grandparent, Label1, Parent)|Acc], Path) :-
( select(move(AlphaBeta, Parent, Label2, Child), TreeIn, TreeOut)
-> start_to_end_(TreeOut, [move(AlphaBeta, Parent, Label2, Child), move(AlphaBeta, Grandparent, Label1, Parent)|Acc], Path)
; Path = [move(AlphaBeta, Grandparent, Label1, Parent)|Acc]
).
extractmoves(move(_, _, Move, _), Move).
%% visited(+History, +Node) is det
% removed this to simply use member(Node, History), but discovered keeping step
% caused N Queens to slow to a crawl
visited(History, move(_RewardsList, _Current, _Move, State)) :-
( memberchk(step(_), State)
-> select(step(_), State, State1)
; State1 = State
),
visited_(History, State1).
visited_([], _) :- false, !.
visited_([move(_, _, _, Next)|History], State) :-
( memberchk(step(_), Next)
-> select(step(_), Next, Next1)
; Next1 = Next
),
( State \= Next1
-> visited_(History, State)
; true
).
</div>
<div class="nb-cell markdown" name="md2">
For the first example, here again is the tic-tac-toe example at a crucial point with X poised to lose unless he blocks O's obvious victory:
```
X | X | O
---+---+---
| O |
---+---+---
| O | X
```
If the AI player is smart enough to know it must do does(x, mark(3, 1)), then when it toggles over to play O it must see the next obvious move:
```
X | X | O
---+---+---
| O |
---+---+---
X | O | X
```
So the code needs to be smart enough to see the next step must be does(o, mark(2, 1)), leaving x only one choice which forces a draw.
</div>
<div class="nb-cell program" name="p2">
:- dynamic true/1, does/2.
role(x).
role(o).
init(cell(1, 1, x)). init(cell(1, 2, x)). init(cell(1, 3, o)).
init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)).
init(cell(3, 1, b)). init(cell(3, 2, o)). init(cell(3, 3, x)).
init(control(x)).
init(step(0)).
legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)).
legal(x, noop) :- true(control(o)).
legal(o, noop) :- true(control(x)).
next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b.
next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ).
next(control(x)) :- true(control(o)).
next(control(o)) :- true(control(x)).
next(step(M)) :- true(step(N)), succ(N, M).
row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)).
column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)).
diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)).
diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)).
line(X) :- row(_M, X).
line(X) :- column(_M, X).
line(X) :- diagonal(X).
open :- true(cell(_M, _N, b)).
goal(x, 100) :- line(x), \+line(o).
goal(x, 50) :- line(x), line(o).
goal(x, 50) :- \+line(x), \+line(o).
goal(x, 0) :- \+line(x), line(o).
goal(o, 100) :- \+line(x), line(o).
goal(o, 50) :- line(x), line(o).
goal(o, 50) :- \+line(x), \+line(o).
goal(o, 0) :- line(x), \+line(o).
terminal :- line(x).
terminal :- line(o).
terminal :- \+open.
heuristic(State, Rewards) :-
retractall(true(_)),
forall(member(Proposition, State), assertz(true(Proposition))),
setof(Role, role(Role), Roles),
setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards).
</div>
<div class="nb-cell query" name="q1">
time(solve_ids(Moves)).
</div>
<div class="nb-cell markdown" name="md3">
The next tic-tac-toe example leads to the one above, but first playing X the AI player needs to see it must do does(x, marks(1,2)), and then as the O player does(o, marks(1,3)).
```
X | |
---+---+---
| O |
---+---+---
| O | X
```
</div>
<div class="nb-cell program" name="p3">
:- dynamic true/1, does/2.
role(x).
role(o).
init(cell(1, 1, x)). init(cell(1, 2, b)). init(cell(1, 3, b)).
init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)).
init(cell(3, 1, b)). init(cell(3, 2, o)). init(cell(3, 3, x)).
init(control(x)).
init(step(0)).
legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)).
legal(x, noop) :- true(control(o)).
legal(o, noop) :- true(control(x)).
next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b.
next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ).
next(control(x)) :- true(control(o)).
next(control(o)) :- true(control(x)).
next(step(M)) :- true(step(N)), succ(N, M).
row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)).
column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)).
diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)).
diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)).
line(X) :- row(_M, X).
line(X) :- column(_M, X).
line(X) :- diagonal(X).
open :- true(cell(_M, _N, b)).
goal(x, 100) :- line(x), \+line(o).
goal(x, 50) :- line(x), line(o).
goal(x, 50) :- \+line(x), \+line(o).
goal(x, 0) :- \+line(x), line(o).
goal(o, 100) :- \+line(x), line(o).
goal(o, 50) :- line(x), line(o).
goal(o, 50) :- \+line(x), \+line(o).
goal(o, 0) :- line(x), \+line(o).
terminal :- line(x).
terminal :- line(o).
terminal :- \+open.
heuristic(State, Rewards) :-
retractall(true(_)),
forall(member(Proposition, State), assertz(true(Proposition))),
setof(Role, role(Role), Roles),
setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards).
</div>
<div class="nb-cell query" name="q2">
time(solve_ids(Moves)).
</div>
<div class="nb-cell markdown" name="md7">
Getting to the point where the AI player could see the required path to a draw from this position took me several hours of coding.
```
X | |
---+---+---
| O |
---+---+---
| |
```
</div>
<div class="nb-cell program" name="p6">
:- dynamic true/1, does/2.
role(x).
role(o).
init(cell(1, 1, x)). init(cell(1, 2, b)). init(cell(1, 3, b)).
init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)).
init(cell(3, 1, b)). init(cell(3, 2, b)). init(cell(3, 3, b)).
init(control(x)).
init(step(0)).
legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)).
legal(x, noop) :- true(control(o)).
legal(o, noop) :- true(control(x)).
next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b.
next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ).
next(control(x)) :- true(control(o)).
next(control(o)) :- true(control(x)).
next(step(M)) :- true(step(N)), succ(N, M).
row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)).
column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)).
diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)).
diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)).
line(X) :- row(_M, X).
line(X) :- column(_M, X).
line(X) :- diagonal(X).
open :- true(cell(_M, _N, b)).
goal(x, 100) :- line(x), \+line(o).
goal(x, 50) :- line(x), line(o).
goal(x, 50) :- \+line(x), \+line(o).
goal(x, 0) :- \+line(x), line(o).
goal(o, 100) :- \+line(x), line(o).
goal(o, 50) :- line(x), line(o).
goal(o, 50) :- \+line(x), \+line(o).
goal(o, 0) :- line(x), \+line(o).
terminal :- line(x).
terminal :- line(o).
terminal :- \+open.
heuristic(State, Rewards) :-
retractall(true(_)),
forall(member(Proposition, State), assertz(true(Proposition))),
setof(Role, role(Role), Roles),
setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards).
</div>
<div class="nb-cell query" name="q5">
time(solve_ids(Path)).
</div>
<div class="nb-cell markdown" name="md4">
</div>
</div>