SWISH -- SWI-Prolog for SHaring

<div class="notebook">

<div class="nb-cell markdown" name="md1">
# Game Trees
_Robert Laing_

This tutorial has now been appended to &lt;https://swish.swi-prolog.org/p/Graphs1.swinb&gt;, partly because I've managed to get the same code base to handle and games and puzzles, and because for some reason I can't keep the same URL when I save with this file for some reason.

Please post any suggestions and comments to a discussion I started at &lt;https://swi-prolog.discourse.group/t/game-tree-tutorial-on-swish/921&gt;.

As with puzzles, I'm using conventions from [Stanford University's General Game Playing](http://ggp.stanford.edu/public/notes.php) course, translating rules written in [Game Description Language](http://ggp.stanford.edu/notes/chapter_02.html) into Prolog.

This is an initial stab at advancing from depth first search to [iterative deepening](https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search).

I've kept the historical names -- [minimax](https://en.wikipedia.org/wiki/Minimax_theorem) and [alpha-beta](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning) -- though they assume two player games, when the same ideas can be applied from one player games (commonly called puzzles) to N player games.

I've tried to keep the code similar to the graph traversal tutorial, but one key change I've made is instead of storing the game tree as a list of move(Parent, Label, Child) clauses, I've expanded this to move(RewardsList, Parent, Label, Child).

As far as I understand the jargon term [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming), this is an example of it in that it involves lots of updating the values of nodes, something Prolog's _assign-once variables_ make fairly laborious, I hope to gradually polish the spaghetti code I've written below do this into something more elegant.
</div>

<div class="nb-cell program" data-background="true" name="p1">
%%% General Game Player library

% move(Value, Parent, Label, Child).

%% solve_dfs(-Moves) is nondet
% Returns a list of does(Player, Move) clauses in sequence to 
% solve the puzzle using depth first search
solve_dfs(Moves) :-
    findinit(Start),
    heuristic(Start, Rewards), % Assuming start not terminal
    depthfirst([move(Rewards, start, noop, Start)], [], Tree),
    start_to_end(Tree, Moves), !.

depthfirst([], Tree, Tree).

depthfirst([move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
    (    visited(Acc, move(Rewards, Parent, Label, Child))
    -&gt;   (    member(move(Rewards, Parent, Label, Child), Acc) % Can different path to existing node
         -&gt;   AccOut = Acc
         ;    backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut)
         ),
         NewFrontier = Frontier
    ;    backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut),
         (    alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child))
         -&gt;   generate_children(Child, Unsorted),
              sort_generated_children(Unsorted, Children),
              append(Children, Frontier, NewFrontier)
         ;    NewFrontier = Frontier
         )
    ),
    depthfirst(NewFrontier, AccOut, Tree).


%% solve_bfs(-Moves) is nondet
% Returns a list of does(Player, Move) clauses in sequence to 
% solve the puzzle using breadth first search
solve_bfs(Moves) :-
    findinit(Start),
    heuristic(Start, Rewards), % Assuming start not terminal
    breadthfirst([move(Rewards, start, noop, Start)], [], Tree),
    start_to_end(Tree, Moves), !.

breadthfirst([], Tree, Tree).

breadthfirst([move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
    (    visited(Acc, move(Rewards, Parent, Label, Child))
    -&gt;   (    member(move(Rewards, Parent, Label, Child), Acc) % Can different path to existing node
         -&gt;   AccOut = Acc
         ;    backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut)
         ),
         NewFrontier = Frontier
    ;    backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut),
         (    alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child))
         -&gt;   generate_children(Child, Unsorted),
              sort_generated_children(Unsorted, Children),
              append(Frontier, Children, NewFrontier)
         ;    NewFrontier = Frontier
         )
    ),
    breadthfirst(NewFrontier, AccOut, Tree).

%% solve_ids(-Moves) is nondet
% Uses iterative deepening to solve larger game trees
solve_ids(Moves) :-
    findinit(Start),
    member(step(Limit), Start),    
    heuristic(Start, Rewards), % Assuming start not terminal
    iterative_deepening(Limit, [move(Rewards, start, noop, Start)], [], Tree),
    start_to_end(Tree, Moves), !.

iterative_deepening(Limit, [move(AlphaBeta, start, noop, Start)], TreeIn, Tree) :-
    ( member(control(Player), Start) ; role(Player) ),
    (    member(goal(Player, 100), AlphaBeta)
    -&gt;   Tree = TreeIn % Puzzle solved or game won
    ;    LimitInc is Limit + 1,
         depthlimited(LimitInc, [move(AlphaBeta, start, noop, Start)], TreeIn, TreeOut),
         (    TreeIn == TreeOut % equilibrium reached, pointless continuing
         -&gt;   Tree = TreeOut
         ;    member(move(NewAlphaBeta, start, noop, Start), TreeOut),
              iterative_deepening(LimitInc, [move(NewAlphaBeta, start, noop, Start)], TreeOut, Tree)
         )        
    ).

depthlimited(_, [], Tree, Tree).

% Limit reached, no new nodes added to Frontier   
depthlimited(Maxdepth, [move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
    member(step(Maxdepth), Child), !,
    (    visited(Acc, move(Rewards, Parent, Label, Child))
    -&gt;   AccOut = Acc
    ;    backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut)
    ),
    depthlimited(Maxdepth, Frontier, AccOut, Tree).

% Either fresh child nodes need to be generated or old ones found in tree

depthlimited(Maxdepth, [move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :-
    member(step(N), Child),
    N &lt; Maxdepth,
    (    visited(Acc, move(Rewards, Parent, Label, Child))
    -&gt;   AccOut = Acc,
         include(children_filter(move(Rewards, Parent, Label, Child)), Acc, Unpruned),
         (    length(Unpruned, 0)
         -&gt;   generate_children(Child, Unsorted),
              sort_generated_children(Unsorted, Children)
         ;    include(alpha_beta_test(Acc), Unpruned, Unsorted),
              sort_old_children(Unsorted, Children)
         ),
         append(Children, Frontier, NewFrontier)
    ;    backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut),
         (    alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child))
         -&gt;   generate_children(Child, Unsorted),
              sort_generated_children(Unsorted, Children),
              append(Children, Frontier, NewFrontier)
         ;    NewFrontier = Frontier
         )
    ),
    depthlimited(Maxdepth, NewFrontier, AccOut, Tree).

    
%% Common predicates

update_state(State) :-
    retractall(true(_)),
    forall(member(Proposition, State), assertz(true(Proposition))).

findinit(Start) :-
    setof(Init, init(Init), Start).

getnext(Parent, does(Player, Move), child(Parent, does(Player, Move), Child)) :-
    retractall(does(_,_)),
    assertz(does(Player, Move)),
    setof(Next, next(Next), Child).

generate_children(Parent, Children) :-
    update_state(Parent),
    (    \+terminal
    -&gt;   ( true(control(Player)) ; role(Player) ), 
         (    setof(does(Player, Move), legal(Player, Move), Legals)
         -&gt;   maplist(getnext(Parent), Legals, Children)
         ;    Children = []
         )
    ;    Children = []
    ).

sort_generated_children([], []).
sort_generated_children(ChildrenIn, ChildrenOut) :-
    value_generated_children(ChildrenIn, WithZeros),
    exclude(zeros, WithZeros, Unsorted),  % Assume only losing terminals have zero
    sort(1, &gt;=, Unsorted, Sorted),
    maplist(strip_value, Sorted, ChildrenOut).

value_generated_children([], []).
value_generated_children([child(Parent, Label, Child)|ChildrenIn], [vm(Value, move(Rewards, Parent, Label, Child))|ChildrenOut]) :-
    update_state(Child),
    (    \+terminal
    -&gt;   heuristic(Child, Rewards) % Need to add heuristic to parent reward value to ensure monotonicity  
    ;    setof(goal(Role, Reward), goal(Role, Reward), Rewards)
    ),
    ( member(control(Player), Parent) ; role(Player) ),
    member(goal(Player, Value), Rewards),
    value_generated_children(ChildrenIn, ChildrenOut).

%% sort_old_children(ChildrenIn, ChildrenOut) is det
% Used with iterative deepening to put nodes from the tree in order before stacking on Frontier
sort_old_children([], []).
sort_old_children([move(Rewards, Parent, Label, Child)|Moves], Children) :-
    ( member(control(Player), Parent) ; role(Player) ),
    maplist(player_value(Player), [move(Rewards, Parent, Label, Child)|Moves], Unsorted),
    sort(1, &gt;=, Unsorted, Sorted),
    maplist(strip_value, Sorted, Children).

player_value(Player, move(Rewards, Parent, Label, Child), vm(Value, move(Rewards, Parent, Label, Child))) :-
    member(goal(Player, Value), Rewards).

zeros(vm(0, _)).
strip_value(vm(_, Move), Move).

%% alpha_beta_test(+NewNode, +Tree) is det
% True if node is worth expanding
% Essentially only looks at zeros and hundreds since heuristic values too flakey
alpha_beta_test(Tree, move(Rewards, Parent, _Label, _Child)) :-
    ( member(control(Player), Parent) ; role(Player) ),
    member(goal(Player, Value), Rewards),
    Value &gt; 0, % Prune 0 value losing moves ruthlessly
    member(move(AlphaBeta, start, noop, _), Tree),
    member(goal(Player, CurrentBest), AlphaBeta),
    (    CurrentBest == 100
    -&gt;   Value == 100  % Only keep winning moves if one has already been found
    ;    Fudged is Value * 4, % Leave wide margin for heuristic error
         Fudged &gt; CurrentBest
    ).

%% backward_induction(+NodeList, +Tree, -UpdatedTree) is det
backward_induction([], Tree, Tree).

backward_induction([move(NewRewards, Parent, Label, Child)|Parents], TreeIn, Tree) :-
    ( member(control(Player), Parent) ; role(Player) ),
    member(goal(Player, NewValue), NewRewards),
    include(parent_filter(move(NewRewards, Parent, Label, Child)), TreeIn, Grandparents),
    value_grandparents(Player, Grandparents, TreeIn, 0, OldValue),
    (    NewValue &gt;= OldValue
    -&gt;   maplist(replace_reward(NewRewards), Grandparents, Revalued),
         revalue_tree(Revalued, TreeIn, TreeOut),
         append(Parents, Revalued, Ancestors)
    ;    TreeOut = TreeIn,
         Ancestors = Parents
    ),
    backward_induction(Ancestors, TreeOut, Tree).

value_grandparents(_, [], _, Max, Max).

value_grandparents(Player, [move(OldRewards, Grandparent, PrevLabel, Parent)|Grandparents], Tree, AccIn, Max) :-
    include(children_filter(move(OldRewards, Grandparent, PrevLabel, Parent)), Tree, Children),
    maplist(player_reward(Player), Children, PlayerRewards),
    max_list(PlayerRewards, Best),
    (    Best &gt; AccIn
    -&gt;   AccOut = Best
    ;    AccOut = AccIn
    ),
    value_grandparents(Player, Grandparents, Tree, AccOut, Max).

revalue_tree([], Tree, Tree).

revalue_tree([move(NewRewards, Parent, Label, Child)|Moves], TreeIn, Tree) :-
    nth1(Idx, TreeIn,  move(_OldRewards, Parent, Label, Child), Rest), 
    nth1(Idx, TreeOut, move(NewRewards, Parent, Label, Child), Rest),
    revalue_tree(Moves, TreeOut, Tree).

% Predicats used with maplist(closure, ListIn, ListOut)
player_reward(Player, move(Rewards, _, _, _), PlayerReward) :-
    member(goal(Player, PlayerReward), Rewards).

replace_reward(NewReward, move(_, Player, Label, Child), move(NewReward, Player, Label, Child)).

% Predicates used with include(filter, ListIn, ListOut)
parent_filter(move(_, Parent, _, _), move(_, _, _, Parent)).
children_filter(move(_, _, _, Child), move(_, Child, _, _)).

start_to_end(TreeIn, Path) :-
    select(move(AlphaBeta, start, noop, Start), TreeIn, TreeOut),  
    start_to_end_(TreeOut, [move(AlphaBeta, start, noop, Start)], RevMoves),
    maplist(extractmoves, RevMoves, RevPath),
    reverse(RevPath, [noop|Path]).

start_to_end_(TreeIn, [move(AlphaBeta, Grandparent, Label1, Parent)|Acc], Path) :-
    (    select(move(AlphaBeta, Parent, Label2, Child), TreeIn, TreeOut)
    -&gt;   start_to_end_(TreeOut, [move(AlphaBeta, Parent, Label2, Child), move(AlphaBeta, Grandparent, Label1, Parent)|Acc], Path)
    ;    Path = [move(AlphaBeta, Grandparent, Label1, Parent)|Acc]
    ).

extractmoves(move(_, _, Move, _), Move).

%% visited(+History, +Node) is det
% removed this to simply use member(Node, History), but discovered keeping step
% caused N Queens to slow to a crawl
visited(History, move(_RewardsList, _Current, _Move, State)) :-
    (    memberchk(step(_), State)
    -&gt;   select(step(_), State, State1)
    ;    State1 = State
    ),
    visited_(History, State1).
    
visited_([], _) :- false, !.
visited_([move(_, _, _, Next)|History], State) :-
    (    memberchk(step(_), Next)
    -&gt;   select(step(_), Next, Next1)
    ;    Next1 = Next
    ),
    (  State \= Next1
    -&gt; visited_(History, State)
    ; true
    ).
</div>

<div class="nb-cell markdown" name="md2">
For the first example, here again is the tic-tac-toe example at a crucial point with X poised to lose unless he blocks O's obvious victory:

```
 X | X | O
---+---+---
   | O |   
---+---+---
   | O | X
```

If the AI player is smart enough to know it must do does(x, mark(3, 1)), then when it toggles over to play O it must see the next obvious move:

```
 X | X | O
---+---+---
   | O |   
---+---+---
 X | O | X
```

So the code needs to be smart enough to see the next step must be does(o, mark(2, 1)), leaving x only one choice which forces a draw.
</div>

<div class="nb-cell program" name="p2">
:- dynamic true/1, does/2.

role(x).
role(o).

init(cell(1, 1, x)). init(cell(1, 2, x)). init(cell(1, 3, o)).
init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)).
init(cell(3, 1, b)). init(cell(3, 2, o)). init(cell(3, 3, x)).

init(control(x)).
init(step(0)).

legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)).
legal(x, noop) :- true(control(o)).
legal(o, noop) :- true(control(x)).

next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b.
next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ).
next(control(x)) :- true(control(o)).
next(control(o)) :- true(control(x)).
next(step(M)) :- true(step(N)), succ(N, M).


row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)).
column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)).
diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)).
diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)).
line(X) :- row(_M, X).
line(X) :- column(_M, X).
line(X) :- diagonal(X).
open :- true(cell(_M, _N, b)).
goal(x, 100) :- line(x), \+line(o).
goal(x, 50) :- line(x), line(o).
goal(x, 50) :- \+line(x), \+line(o).
goal(x, 0) :- \+line(x), line(o).
goal(o, 100) :- \+line(x), line(o).
goal(o, 50) :- line(x), line(o).
goal(o, 50) :- \+line(x), \+line(o).
goal(o, 0) :- line(x), \+line(o).
terminal :- line(x).
terminal :- line(o).
terminal :- \+open.


heuristic(State, Rewards) :-
    retractall(true(_)), 
    forall(member(Proposition, State), assertz(true(Proposition))),
    setof(Role, role(Role), Roles),
    setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards).
</div>

<div class="nb-cell query" name="q1">
time(solve_ids(Moves)).
</div>

<div class="nb-cell markdown" name="md3">
The next tic-tac-toe example leads to the one above, but first playing X the AI player needs to see it must do does(x, marks(1,2)), and then as the O player does(o, marks(1,3)).

```
 X |   | 
---+---+---
   | O |   
---+---+---
   | O | X
```
</div>

<div class="nb-cell program" name="p3">
:- dynamic true/1, does/2.

role(x).
role(o).

init(cell(1, 1, x)). init(cell(1, 2, b)). init(cell(1, 3, b)).
init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)).
init(cell(3, 1, b)). init(cell(3, 2, o)). init(cell(3, 3, x)).

init(control(x)).
init(step(0)).

legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)).
legal(x, noop) :- true(control(o)).
legal(o, noop) :- true(control(x)).

next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b.
next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ).
next(control(x)) :- true(control(o)).
next(control(o)) :- true(control(x)).
next(step(M)) :- true(step(N)), succ(N, M).


row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)).
column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)).
diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)).
diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)).
line(X) :- row(_M, X).
line(X) :- column(_M, X).
line(X) :- diagonal(X).
open :- true(cell(_M, _N, b)).
goal(x, 100) :- line(x), \+line(o).
goal(x, 50) :- line(x), line(o).
goal(x, 50) :- \+line(x), \+line(o).
goal(x, 0) :- \+line(x), line(o).
goal(o, 100) :- \+line(x), line(o).
goal(o, 50) :- line(x), line(o).
goal(o, 50) :- \+line(x), \+line(o).
goal(o, 0) :- line(x), \+line(o).
terminal :- line(x).
terminal :- line(o).
terminal :- \+open.


heuristic(State, Rewards) :-
    retractall(true(_)), 
    forall(member(Proposition, State), assertz(true(Proposition))),
    setof(Role, role(Role), Roles),
    setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards).
</div>

<div class="nb-cell query" name="q2">
time(solve_ids(Moves)).
</div>

<div class="nb-cell markdown" name="md7">
Getting to the point where the AI player could see the required path to a draw from this position took me several hours of coding.

```
 X |   | 
---+---+---
   | O |   
---+---+---
   |   | 
```
</div>

<div class="nb-cell program" name="p6">
:- dynamic true/1, does/2.

role(x).
role(o).

init(cell(1, 1, x)). init(cell(1, 2, b)). init(cell(1, 3, b)).
init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)).
init(cell(3, 1, b)). init(cell(3, 2, b)). init(cell(3, 3, b)).

init(control(x)).
init(step(0)).

legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)).
legal(x, noop) :- true(control(o)).
legal(o, noop) :- true(control(x)).

next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)).
next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b.
next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ).
next(control(x)) :- true(control(o)).
next(control(o)) :- true(control(x)).
next(step(M)) :- true(step(N)), succ(N, M).


row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)).
column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)).
diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)).
diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)).
line(X) :- row(_M, X).
line(X) :- column(_M, X).
line(X) :- diagonal(X).
open :- true(cell(_M, _N, b)).
goal(x, 100) :- line(x), \+line(o).
goal(x, 50) :- line(x), line(o).
goal(x, 50) :- \+line(x), \+line(o).
goal(x, 0) :- \+line(x), line(o).
goal(o, 100) :- \+line(x), line(o).
goal(o, 50) :- line(x), line(o).
goal(o, 50) :- \+line(x), \+line(o).
goal(o, 0) :- line(x), \+line(o).
terminal :- line(x).
terminal :- line(o).
terminal :- \+open.


heuristic(State, Rewards) :-
    retractall(true(_)), 
    forall(member(Proposition, State), assertz(true(Proposition))),
    setof(Role, role(Role), Roles),
    setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards).
</div>

<div class="nb-cell query" name="q5">
time(solve_ids(Path)).
</div>

<div class="nb-cell markdown" name="md4">

</div>

</div>