<div class="notebook"> <div class="nb-cell markdown" name="md1"> # Game Trees _Robert Laing_ This tutorial has now been appended to <https://swish.swi-prolog.org/p/Graphs1.swinb>, partly because I've managed to get the same code base to handle and games and puzzles, and because for some reason I can't keep the same URL when I save with this file for some reason. Please post any suggestions and comments to a discussion I started at <https://swi-prolog.discourse.group/t/game-tree-tutorial-on-swish/921>. As with puzzles, I'm using conventions from [Stanford University's General Game Playing](http://ggp.stanford.edu/public/notes.php) course, translating rules written in [Game Description Language](http://ggp.stanford.edu/notes/chapter_02.html) into Prolog. This is an initial stab at advancing from depth first search to [iterative deepening](https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search). I've kept the historical names -- [minimax](https://en.wikipedia.org/wiki/Minimax_theorem) and [alpha-beta](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning) -- though they assume two player games, when the same ideas can be applied from one player games (commonly called puzzles) to N player games. I've tried to keep the code similar to the graph traversal tutorial, but one key change I've made is instead of storing the game tree as a list of move(Parent, Label, Child) clauses, I've expanded this to move(RewardsList, Parent, Label, Child). As far as I understand the jargon term [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming), this is an example of it in that it involves lots of updating the values of nodes, something Prolog's _assign-once variables_ make fairly laborious, I hope to gradually polish the spaghetti code I've written below do this into something more elegant. </div> <div class="nb-cell program" data-background="true" name="p1"> %%% General Game Player library % move(Value, Parent, Label, Child). %% solve_dfs(-Moves) is nondet % Returns a list of does(Player, Move) clauses in sequence to % solve the puzzle using depth first search solve_dfs(Moves) :- findinit(Start), heuristic(Start, Rewards), % Assuming start not terminal depthfirst([move(Rewards, start, noop, Start)], [], Tree), start_to_end(Tree, Moves), !. depthfirst([], Tree, Tree). depthfirst([move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :- ( visited(Acc, move(Rewards, Parent, Label, Child)) -> ( member(move(Rewards, Parent, Label, Child), Acc) % Can different path to existing node -> AccOut = Acc ; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut) ), NewFrontier = Frontier ; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut), ( alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child)) -> generate_children(Child, Unsorted), sort_generated_children(Unsorted, Children), append(Children, Frontier, NewFrontier) ; NewFrontier = Frontier ) ), depthfirst(NewFrontier, AccOut, Tree). %% solve_bfs(-Moves) is nondet % Returns a list of does(Player, Move) clauses in sequence to % solve the puzzle using breadth first search solve_bfs(Moves) :- findinit(Start), heuristic(Start, Rewards), % Assuming start not terminal breadthfirst([move(Rewards, start, noop, Start)], [], Tree), start_to_end(Tree, Moves), !. breadthfirst([], Tree, Tree). breadthfirst([move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :- ( visited(Acc, move(Rewards, Parent, Label, Child)) -> ( member(move(Rewards, Parent, Label, Child), Acc) % Can different path to existing node -> AccOut = Acc ; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut) ), NewFrontier = Frontier ; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut), ( alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child)) -> generate_children(Child, Unsorted), sort_generated_children(Unsorted, Children), append(Frontier, Children, NewFrontier) ; NewFrontier = Frontier ) ), breadthfirst(NewFrontier, AccOut, Tree). %% solve_ids(-Moves) is nondet % Uses iterative deepening to solve larger game trees solve_ids(Moves) :- findinit(Start), member(step(Limit), Start), heuristic(Start, Rewards), % Assuming start not terminal iterative_deepening(Limit, [move(Rewards, start, noop, Start)], [], Tree), start_to_end(Tree, Moves), !. iterative_deepening(Limit, [move(AlphaBeta, start, noop, Start)], TreeIn, Tree) :- ( member(control(Player), Start) ; role(Player) ), ( member(goal(Player, 100), AlphaBeta) -> Tree = TreeIn % Puzzle solved or game won ; LimitInc is Limit + 1, depthlimited(LimitInc, [move(AlphaBeta, start, noop, Start)], TreeIn, TreeOut), ( TreeIn == TreeOut % equilibrium reached, pointless continuing -> Tree = TreeOut ; member(move(NewAlphaBeta, start, noop, Start), TreeOut), iterative_deepening(LimitInc, [move(NewAlphaBeta, start, noop, Start)], TreeOut, Tree) ) ). depthlimited(_, [], Tree, Tree). % Limit reached, no new nodes added to Frontier depthlimited(Maxdepth, [move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :- member(step(Maxdepth), Child), !, ( visited(Acc, move(Rewards, Parent, Label, Child)) -> AccOut = Acc ; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut) ), depthlimited(Maxdepth, Frontier, AccOut, Tree). % Either fresh child nodes need to be generated or old ones found in tree depthlimited(Maxdepth, [move(Rewards, Parent, Label, Child)|Frontier], Acc, Tree) :- member(step(N), Child), N < Maxdepth, ( visited(Acc, move(Rewards, Parent, Label, Child)) -> AccOut = Acc, include(children_filter(move(Rewards, Parent, Label, Child)), Acc, Unpruned), ( length(Unpruned, 0) -> generate_children(Child, Unsorted), sort_generated_children(Unsorted, Children) ; include(alpha_beta_test(Acc), Unpruned, Unsorted), sort_old_children(Unsorted, Children) ), append(Children, Frontier, NewFrontier) ; backward_induction([move(Rewards, Parent, Label, Child)], [move(Rewards, Parent, Label, Child)|Acc], AccOut), ( alpha_beta_test(AccOut, move(Rewards, Parent, Label, Child)) -> generate_children(Child, Unsorted), sort_generated_children(Unsorted, Children), append(Children, Frontier, NewFrontier) ; NewFrontier = Frontier ) ), depthlimited(Maxdepth, NewFrontier, AccOut, Tree). %% Common predicates update_state(State) :- retractall(true(_)), forall(member(Proposition, State), assertz(true(Proposition))). findinit(Start) :- setof(Init, init(Init), Start). getnext(Parent, does(Player, Move), child(Parent, does(Player, Move), Child)) :- retractall(does(_,_)), assertz(does(Player, Move)), setof(Next, next(Next), Child). generate_children(Parent, Children) :- update_state(Parent), ( \+terminal -> ( true(control(Player)) ; role(Player) ), ( setof(does(Player, Move), legal(Player, Move), Legals) -> maplist(getnext(Parent), Legals, Children) ; Children = [] ) ; Children = [] ). sort_generated_children([], []). sort_generated_children(ChildrenIn, ChildrenOut) :- value_generated_children(ChildrenIn, WithZeros), exclude(zeros, WithZeros, Unsorted), % Assume only losing terminals have zero sort(1, >=, Unsorted, Sorted), maplist(strip_value, Sorted, ChildrenOut). value_generated_children([], []). value_generated_children([child(Parent, Label, Child)|ChildrenIn], [vm(Value, move(Rewards, Parent, Label, Child))|ChildrenOut]) :- update_state(Child), ( \+terminal -> heuristic(Child, Rewards) % Need to add heuristic to parent reward value to ensure monotonicity ; setof(goal(Role, Reward), goal(Role, Reward), Rewards) ), ( member(control(Player), Parent) ; role(Player) ), member(goal(Player, Value), Rewards), value_generated_children(ChildrenIn, ChildrenOut). %% sort_old_children(ChildrenIn, ChildrenOut) is det % Used with iterative deepening to put nodes from the tree in order before stacking on Frontier sort_old_children([], []). sort_old_children([move(Rewards, Parent, Label, Child)|Moves], Children) :- ( member(control(Player), Parent) ; role(Player) ), maplist(player_value(Player), [move(Rewards, Parent, Label, Child)|Moves], Unsorted), sort(1, >=, Unsorted, Sorted), maplist(strip_value, Sorted, Children). player_value(Player, move(Rewards, Parent, Label, Child), vm(Value, move(Rewards, Parent, Label, Child))) :- member(goal(Player, Value), Rewards). zeros(vm(0, _)). strip_value(vm(_, Move), Move). %% alpha_beta_test(+NewNode, +Tree) is det % True if node is worth expanding % Essentially only looks at zeros and hundreds since heuristic values too flakey alpha_beta_test(Tree, move(Rewards, Parent, _Label, _Child)) :- ( member(control(Player), Parent) ; role(Player) ), member(goal(Player, Value), Rewards), Value > 0, % Prune 0 value losing moves ruthlessly member(move(AlphaBeta, start, noop, _), Tree), member(goal(Player, CurrentBest), AlphaBeta), ( CurrentBest == 100 -> Value == 100 % Only keep winning moves if one has already been found ; Fudged is Value * 4, % Leave wide margin for heuristic error Fudged > CurrentBest ). %% backward_induction(+NodeList, +Tree, -UpdatedTree) is det backward_induction([], Tree, Tree). backward_induction([move(NewRewards, Parent, Label, Child)|Parents], TreeIn, Tree) :- ( member(control(Player), Parent) ; role(Player) ), member(goal(Player, NewValue), NewRewards), include(parent_filter(move(NewRewards, Parent, Label, Child)), TreeIn, Grandparents), value_grandparents(Player, Grandparents, TreeIn, 0, OldValue), ( NewValue >= OldValue -> maplist(replace_reward(NewRewards), Grandparents, Revalued), revalue_tree(Revalued, TreeIn, TreeOut), append(Parents, Revalued, Ancestors) ; TreeOut = TreeIn, Ancestors = Parents ), backward_induction(Ancestors, TreeOut, Tree). value_grandparents(_, [], _, Max, Max). value_grandparents(Player, [move(OldRewards, Grandparent, PrevLabel, Parent)|Grandparents], Tree, AccIn, Max) :- include(children_filter(move(OldRewards, Grandparent, PrevLabel, Parent)), Tree, Children), maplist(player_reward(Player), Children, PlayerRewards), max_list(PlayerRewards, Best), ( Best > AccIn -> AccOut = Best ; AccOut = AccIn ), value_grandparents(Player, Grandparents, Tree, AccOut, Max). revalue_tree([], Tree, Tree). revalue_tree([move(NewRewards, Parent, Label, Child)|Moves], TreeIn, Tree) :- nth1(Idx, TreeIn, move(_OldRewards, Parent, Label, Child), Rest), nth1(Idx, TreeOut, move(NewRewards, Parent, Label, Child), Rest), revalue_tree(Moves, TreeOut, Tree). % Predicats used with maplist(closure, ListIn, ListOut) player_reward(Player, move(Rewards, _, _, _), PlayerReward) :- member(goal(Player, PlayerReward), Rewards). replace_reward(NewReward, move(_, Player, Label, Child), move(NewReward, Player, Label, Child)). % Predicates used with include(filter, ListIn, ListOut) parent_filter(move(_, Parent, _, _), move(_, _, _, Parent)). children_filter(move(_, _, _, Child), move(_, Child, _, _)). start_to_end(TreeIn, Path) :- select(move(AlphaBeta, start, noop, Start), TreeIn, TreeOut), start_to_end_(TreeOut, [move(AlphaBeta, start, noop, Start)], RevMoves), maplist(extractmoves, RevMoves, RevPath), reverse(RevPath, [noop|Path]). start_to_end_(TreeIn, [move(AlphaBeta, Grandparent, Label1, Parent)|Acc], Path) :- ( select(move(AlphaBeta, Parent, Label2, Child), TreeIn, TreeOut) -> start_to_end_(TreeOut, [move(AlphaBeta, Parent, Label2, Child), move(AlphaBeta, Grandparent, Label1, Parent)|Acc], Path) ; Path = [move(AlphaBeta, Grandparent, Label1, Parent)|Acc] ). extractmoves(move(_, _, Move, _), Move). %% visited(+History, +Node) is det % removed this to simply use member(Node, History), but discovered keeping step % caused N Queens to slow to a crawl visited(History, move(_RewardsList, _Current, _Move, State)) :- ( memberchk(step(_), State) -> select(step(_), State, State1) ; State1 = State ), visited_(History, State1). visited_([], _) :- false, !. visited_([move(_, _, _, Next)|History], State) :- ( memberchk(step(_), Next) -> select(step(_), Next, Next1) ; Next1 = Next ), ( State \= Next1 -> visited_(History, State) ; true ). </div> <div class="nb-cell markdown" name="md2"> For the first example, here again is the tic-tac-toe example at a crucial point with X poised to lose unless he blocks O's obvious victory: ``` X | X | O ---+---+--- | O | ---+---+--- | O | X ``` If the AI player is smart enough to know it must do does(x, mark(3, 1)), then when it toggles over to play O it must see the next obvious move: ``` X | X | O ---+---+--- | O | ---+---+--- X | O | X ``` So the code needs to be smart enough to see the next step must be does(o, mark(2, 1)), leaving x only one choice which forces a draw. </div> <div class="nb-cell program" name="p2"> :- dynamic true/1, does/2. role(x). role(o). init(cell(1, 1, x)). init(cell(1, 2, x)). init(cell(1, 3, o)). init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)). init(cell(3, 1, b)). init(cell(3, 2, o)). init(cell(3, 3, x)). init(control(x)). init(step(0)). legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)). legal(x, noop) :- true(control(o)). legal(o, noop) :- true(control(x)). next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)). next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)). next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b. next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ). next(control(x)) :- true(control(o)). next(control(o)) :- true(control(x)). next(step(M)) :- true(step(N)), succ(N, M). row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)). column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)). diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)). diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)). line(X) :- row(_M, X). line(X) :- column(_M, X). line(X) :- diagonal(X). open :- true(cell(_M, _N, b)). goal(x, 100) :- line(x), \+line(o). goal(x, 50) :- line(x), line(o). goal(x, 50) :- \+line(x), \+line(o). goal(x, 0) :- \+line(x), line(o). goal(o, 100) :- \+line(x), line(o). goal(o, 50) :- line(x), line(o). goal(o, 50) :- \+line(x), \+line(o). goal(o, 0) :- line(x), \+line(o). terminal :- line(x). terminal :- line(o). terminal :- \+open. heuristic(State, Rewards) :- retractall(true(_)), forall(member(Proposition, State), assertz(true(Proposition))), setof(Role, role(Role), Roles), setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards). </div> <div class="nb-cell query" name="q1"> time(solve_ids(Moves)). </div> <div class="nb-cell markdown" name="md3"> The next tic-tac-toe example leads to the one above, but first playing X the AI player needs to see it must do does(x, marks(1,2)), and then as the O player does(o, marks(1,3)). ``` X | | ---+---+--- | O | ---+---+--- | O | X ``` </div> <div class="nb-cell program" name="p3"> :- dynamic true/1, does/2. role(x). role(o). init(cell(1, 1, x)). init(cell(1, 2, b)). init(cell(1, 3, b)). init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)). init(cell(3, 1, b)). init(cell(3, 2, o)). init(cell(3, 3, x)). init(control(x)). init(step(0)). legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)). legal(x, noop) :- true(control(o)). legal(o, noop) :- true(control(x)). next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)). next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)). next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b. next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ). next(control(x)) :- true(control(o)). next(control(o)) :- true(control(x)). next(step(M)) :- true(step(N)), succ(N, M). row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)). column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)). diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)). diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)). line(X) :- row(_M, X). line(X) :- column(_M, X). line(X) :- diagonal(X). open :- true(cell(_M, _N, b)). goal(x, 100) :- line(x), \+line(o). goal(x, 50) :- line(x), line(o). goal(x, 50) :- \+line(x), \+line(o). goal(x, 0) :- \+line(x), line(o). goal(o, 100) :- \+line(x), line(o). goal(o, 50) :- line(x), line(o). goal(o, 50) :- \+line(x), \+line(o). goal(o, 0) :- line(x), \+line(o). terminal :- line(x). terminal :- line(o). terminal :- \+open. heuristic(State, Rewards) :- retractall(true(_)), forall(member(Proposition, State), assertz(true(Proposition))), setof(Role, role(Role), Roles), setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards). </div> <div class="nb-cell query" name="q2"> time(solve_ids(Moves)). </div> <div class="nb-cell markdown" name="md7"> Getting to the point where the AI player could see the required path to a draw from this position took me several hours of coding. ``` X | | ---+---+--- | O | ---+---+--- | | ``` </div> <div class="nb-cell program" name="p6"> :- dynamic true/1, does/2. role(x). role(o). init(cell(1, 1, x)). init(cell(1, 2, b)). init(cell(1, 3, b)). init(cell(2, 1, b)). init(cell(2, 2, o)). init(cell(2, 3, b)). init(cell(3, 1, b)). init(cell(3, 2, b)). init(cell(3, 3, b)). init(control(x)). init(step(0)). legal(W, mark(X, Y)) :- true(cell(X, Y, b)), true(control(W)). legal(x, noop) :- true(control(o)). legal(o, noop) :- true(control(x)). next(cell(M, N, x)) :- does(x, mark(M, N)), true(cell(M, N, b)). next(cell(M, N, o)) :- does(o, mark(M, N)), true(cell(M, N, b)). next(cell(M, N, W)) :- true(cell(M, N, W)), W \== b. next(cell(M, N, b)) :- does(_W, mark(J, K)), true(cell(M, N, b)), ( M \== J ; N \== K ). next(control(x)) :- true(control(o)). next(control(o)) :- true(control(x)). next(step(M)) :- true(step(N)), succ(N, M). row(M, X) :- true(cell(M, 1, X)), true(cell(M, 2, X)), true(cell(M, 3, X)). column(N, X) :- true(cell(1, N, X)), true(cell(2, N, X)), true(cell(3, N, X)). diagonal(X) :- true(cell(1, 1, X)), true(cell(2, 2, X)), true(cell(3, 3, X)). diagonal(X) :- true(cell(1, 3, X)), true(cell(2, 2, X)), true(cell(3, 1, X)). line(X) :- row(_M, X). line(X) :- column(_M, X). line(X) :- diagonal(X). open :- true(cell(_M, _N, b)). goal(x, 100) :- line(x), \+line(o). goal(x, 50) :- line(x), line(o). goal(x, 50) :- \+line(x), \+line(o). goal(x, 0) :- \+line(x), line(o). goal(o, 100) :- \+line(x), line(o). goal(o, 50) :- line(x), line(o). goal(o, 50) :- \+line(x), \+line(o). goal(o, 0) :- line(x), \+line(o). terminal :- line(x). terminal :- line(o). terminal :- \+open. heuristic(State, Rewards) :- retractall(true(_)), forall(member(Proposition, State), assertz(true(Proposition))), setof(Role, role(Role), Roles), setof(goal(Role, Reward), (member(Role, Roles), goal(Role, Reward)), Rewards). </div> <div class="nb-cell query" name="q5"> time(solve_ids(Path)). </div> <div class="nb-cell markdown" name="md4"> </div> </div>