<div class="notebook"> <div class="nb-cell markdown" name="md1"> # An introduction to Prolog for SQL programers *By Robert Laing* I've translated the introductory example in [Databases: Relational Databases and SQL](https://www.edx.org/course/databases-5-sql?utm_source=sailthru&utm_medium=email&utm_campaign=triggered_shareit) into Prolog since I've personally found thinking of Prolog as a relational database query language a handy way to get started. I think I'm using Prolog synonymously with Datalog in these examples, but don't know Datalog that well. The course used the following three tables to illustrate [the relational data model](http://i.stanford.edu/~ullman/focs/ch08.pdf) which written in Prolog look as follows: </div> <div class="nb-cell program" data-background="true" name="p1"> :- dynamic college/3, student/4, apply/4. %! college(?CName:text, ?State:text, ?Enrollment:integer) is nondet college('Stanford', 'CA', 15000). college('Berkeley', 'CA', 36000). college('MIT', 'MA', 10000). college('Cornell', 'NY', 21000). %! student(?SID:text, ?SName:text, ?GPA:float, ?SizeHS:integer) is nondet student(123, 'Amy', 3.9, 1000). student(234, 'Bob', 3.6, 1500). student(345, 'Craig', 3.5, 500). student(456, 'Doris', 3.9, 1000). student(567, 'Edward', 2.9, 2000). student(678, 'Fay', 3.8, 200). student(789, 'Gary', 3.4, 800). student(987, 'Helen', 3.7, 800). student(876, 'Irene', 3.9, 400). student(765, 'Jay', 2.9, 1500). student(654, 'Amy', 3.9, 1000). student(543, 'Craig', 3.4, 2000). %! apply(?SID:integer, ?CName:text, ?Major:text, ?Decision:text) is nondet apply(123, 'Stanford', 'CS', 'Y'). apply(123, 'Stanford', 'EE', 'N'). apply(123, 'Berkeley', 'CS', 'Y'). apply(123, 'Cornell', 'EE', 'Y'). apply(234, 'Berkeley', 'biology', 'N'). apply(345, 'MIT', 'bioengineering', 'Y'). apply(345, 'Cornell', 'bioengineering', 'N'). apply(345, 'Cornell', 'CS', 'Y'). apply(345, 'Cornell', 'EE', 'N'). apply(678, 'Stanford', 'history', 'Y'). apply(987, 'Stanford', 'CS', 'Y'). apply(987, 'Berkeley', 'CS', 'Y'). apply(876, 'Stanford', 'CS', 'N'). apply(876, 'MIT', 'biology', 'Y'). apply(876, 'MIT', 'marine biology', 'N'). apply(765, 'Stanford', 'history', 'Y'). apply(765, 'Cornell', 'history', 'N'). apply(765, 'Cornell', 'psychology', 'Y'). apply(543, 'MIT', 'CS', 'N'). </div> <div class="nb-cell markdown" name="md46"> With Swish, the above data is loaded and available to all following queries (including ones after subsequent program cells) because I selected the *global icon*. If you are doing these examples in the swipl repl, you could use [consult(:File)](https://www.swi-prolog.org/pldoc/doc_for?object=consult/1) to load the data. I discovered that by clicking on the spanner icon and selecting table, Swish gives me a nice "relational" result. In the swipl repl, you would press `;` (In prolog commas are *and* and semicolons are *or*). </div> <div class="nb-cell query" data-chunk="4" data-tabled="true" name="q42"> college(CName, State, Enrollment). </div> <div class="nb-cell query" data-chunk="12" data-tabled="true" name="q44"> student(SID, SName, GPA, SizeHS). </div> <div class="nb-cell query" data-chunk="19" data-tabled="true" name="q43"> apply(SID, CName, Major, Decision). </div> <div class="nb-cell markdown" name="md2"> ### Basic selection > IDs, names, and GPAs of students with GPA > 3.6 ```sql select sID, sName, GPA from Student where GPA > 3.6; ``` </div> <div class="nb-cell query" data-chunk="7" data-tabled="true" name="q1"> student(SID, SName, GPA, _SizeHS), GPA > 3.6. </div> <div class="nb-cell markdown" name="md3"> > Same query without GPA ```sql select sID, sName from Student where GPA > 3.6; ``` The second example only differs from the first in that the GPA attribute isn't wanted in the output, but is still needed as a query filter. A trick here is using an underscore followed by a variable name (as opposed to just an underscore for "don't care variables"). Even though `_GPA` isn't output because of its leading underscore, it still links the third variable in the `student/4` clause to greater-than condition. </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q3"> student(SID, SName, _GPA, _SizeHS), _GPA > 3.6. </div> <div class="nb-cell markdown" name="md4"> ### Table joins > Student names and majors for which they've applied ```sql select sName, major from Student, Apply where Student.sID = Apply.sID; ``` A nifty trick in Prolog is equality can be checked as in `Student.sID = Apply.sID` by simply placing the same variable name in both positions (`_ID` in the example below). Again if I don't want the "joining" variable in the output, I provide a leading underscore. </div> <div class="nb-cell query" data-chunk="20" data-tabled="true" name="q2"> student(_ID, SName, _GPA, _SizeHS), apply(_ID, _CName, Major, _Decision). </div> <div class="nb-cell markdown" name="md5"> A common gripe of maths purists is the poor support for [the set data model](http://i.stanford.edu/~ullman/focs/ch07.pdf) in programing languages. One of the basics of sets is no duplicate elements, which is surprisingly tricky as translating the following SQL example shows: > Same query with Distinct ```sql select distinct sName, major from Student, Apply where Student.sID = Apply.sID; ``` SWI Prolog has a [distinct(?Witness, :Goal)](https://www.swi-prolog.org/pldoc/doc_for?object=distinct/2) predicate, something I only became aware of after the first version of this tutorial. It is part of the [solution sequences](https://www.swi-prolog.org/pldoc/man?section=solutionsequences) library. </div> <div class="nb-cell query" data-chunk="20" data-tabled="true" name="q4"> distinct([SName, Major], ( student(_ID, SName, _GPA, _SizeHS), apply(_ID, _CName, Major, _Decision) ) ). </div> <div class="nb-cell markdown" name="md6"> The following SQL example uses more convoluted `where...` lines, which is where Prolog, as a language niched at predicate logic, shines. > Names and GPAs of students with sizeHS < 1000 applying to CS at Stanford, and the application decision ```sql select sname, GPA, decision from Student, Apply where Student.sID = Apply.sID and sizeHS < 1000 and major = 'CS' and cname = 'Stanford'; ``` </div> <div class="nb-cell query" data-chunk="7" data-tabled="true" name="q5"> student(_ID, SName, GPA, _SizeHS), apply(_ID, 'Stanford', 'CS', Decision), _SizeHS < 1000. </div> <div class="nb-cell markdown" name="md7"> > All large campuses with CS applicants ```sql select distinct College.cName from College, Apply where College.cName = Apply.cName and enrollment > 20000 and major = 'CS'; ``` </div> <div class="nb-cell query" data-chunk="5" data-tabled="true" name="q6"> distinct(CName, ( college(CName, _State, _Enrollment), apply(_SID, CName, 'CS', _Decision), _Enrollment > 20000 )). </div> <div class="nb-cell markdown" name="md8"> ### order by Here we us [order_by(+Spec, :Goal)](https://www.swi-prolog.org/pldoc/doc_for?object=order_by/2), which I only became aware of after using [sort(+Key, +Order, +List, -Sorted)](https://www.swi-prolog.org/pldoc/doc_for?object=sort/4) where the +Order allows the output to be ascending or descending. > Application information, sort by decreasing GPA ```sql select Student.sID, sName, GPA, Apply.cName, enrollment from Student, College, Apply where Apply.sID = Student.sID and Apply.cName = College.cName order by GPA desc; ``` </div> <div class="nb-cell query" data-chunk="21" data-tabled="true" name="q7"> order_by([desc(GPA)], ( student(ID, SName, GPA, _SizeHS), college(CName, _State, Enrollment), apply(ID, CName, _Major, _Decision) )). </div> <div class="nb-cell markdown" name="md9"> Next challenge to is to create a "tie breaker" by sorting matching GPAs by ascending enrollments. > Application information, sort by decreasing GPA Then by increasing enrollment ```sql select Student.sID, sName, GPA, Apply.cName, enrollment from Student, College, Apply where Apply.sID = Student.sID and Apply.cName = College.cName order by GPA desc, enrollment; ``` </div> <div class="nb-cell query" data-chunk="31" data-tabled="true" name="q8"> order_by([desc(GPA), asc(Enrollment)], ( student(ID, SName, GPA, _SizeHS), college(CName, _State, Enrollment), apply(ID, CName, _Major, _Decision) )). </div> <div class="nb-cell markdown" name="md10"> ### Searching for substrings > Applicants to bio majors ```sql select sID, major from Apply where major like '%bio%'; ``` </div> <div class="nb-cell query" data-chunk="11" data-tabled="true" name="q9"> apply(SID, _CName, Major, _Decision), sub_atom(Major, _, _, _, 'bio'). </div> <div class="nb-cell markdown" name="md11"> ### Arithmetic Functions > Add scaled GPA based on sizeHS ```sql select sID, sName, GPA, sizeHS, GPA*(sizeHS/1000.0) as scaledGPA from Student; ``` </div> <div class="nb-cell query" data-chunk="32" data-tabled="true" name="q10"> student(SID, SName, GPA, SizeHS), ScaledGPA is GPA * (SizeHS / 1000). </div> <div class="nb-cell markdown" name="md12"> ### Self joins Whereas in SQL you give each instance of a relation distinguishing alias -- S1 and S2 in this example for two instances of the Student table -- while attributes retain their names, in Prolog the relations retain their names while the attributes are differentiated by say SID1 vs SID2 etc, unless we want to match them for a join, as with GPA here. > Pairs of students with same GPA ```sql select S1.sID, S1.sName, S1.GPA, S2.sID, S2.sName, S2.GPA from Student S1, Student S2 where S1.GPA = S2.GPA and S1.sID < S2.sID; ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q11"> student(SID1, SName1, GPA, _SizeHS1), student(SID2, SName2, GPA, _SizeHS2), SID1 < SID2. </div> <div class="nb-cell markdown" name="md13"> ### Union Not only do the maths symbols for _union_ *A ∪ B* and _or_ *A ∨ B* look related, they are related. As are the symbols for _intersection_ *A ∩ B* and _and_ *A ∧ B*. > List of college names and student names ```sql select cName as name from College union select sName as name from Student; ``` </div> <div class="nb-cell query" data-chunk="21" data-tabled="true" name="q12"> college(Name, _, _) ; student(_, Name, _, _). </div> <div class="nb-cell markdown" name="md14"> ### Intersection > IDs of students who applied to both CS and EE ```sql select sID from Apply where major = 'CS' intersect select sID from Apply where major = 'EE'; ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q13"> apply(SID, _, 'CS', _) , apply(SID, _, 'EE', _). </div> <div class="nb-cell markdown" name="md15"> SID 123 appears four times because Amy made four college applications, twice for CS and twice for EE. The duplicates could be removed like this: </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q14"> distinct(SID, ( apply(SID, _, 'CS', _) , apply(SID, _, 'EE', _) )). </div> <div class="nb-cell markdown" name="md16"> ### Except > IDs of students who applied to CS but not EE ```sql select sID from Apply where major = 'CS' except select sID from Apply where major = 'EE'; ``` </div> <div class="nb-cell query" data-chunk="20" data-tabled="true" name="q15"> apply(SID, _, 'CS', _), \+apply(SID, _, 'EE', _). </div> <div class="nb-cell markdown" name="md18"> ## Subqueries A key advantage Prolog/Datalog has over SQL is making it easier to break big problems into smaller components. SQL scripts have a horrible habit of growing into nested `select .... from .... where` blocks. As the following examples show, each of SQL's three basic clauses in turn can contain any number of `select .... from .... where` queries, each of which in turn can contain more deeply nested queries, creating "write once, read never" code. ### Subqueries in the where clause These examples use SQL's [exists](https://www.postgresql.org/docs/12/functions-subquery.html), in, not in, any/some, and all builtins. The Prolog equivalent would typically be to translate the SQL subquery using [findall(+Template, :Goal, -List)](https://www.swi-prolog.org/pldoc/doc_for?object=findall/3), and then using [memberchk(?Elem, +List)](https://www.swi-prolog.org/pldoc/doc_for?object=memberchk/2). > IDs and names of students applying to CS ```sql select sID, sName from Student where sID in (select sID from Apply where major = 'CS'); ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q18"> findall(_ID, apply(_ID, _, 'CS', _), _L), student(SID, SName, _, _), memberchk(SID, _L). </div> <div class="nb-cell markdown" name="md41"> > Colleges such that some other college is in the same state ```sql select cName, state from College C1 where exists (select * from College C2 where C2.state = C1.state and C2.cName <> C1.cName); ``` </div> <div class="nb-cell query" data-chunk="6" data-tabled="true" name="q37"> findall(_CName1, ( college(_CName1, _State, _), college(_CName2, _State, _), _CName1 \== _CName2 ), _L), college(CName, State, _), memberchk(CName, _L). </div> <div class="nb-cell markdown" name="md19"> > Biggest college This example has a _correlated reference_ -- C1 declared outside of the subquery is used in the subquery. Note below, this also works in Prolog, where `_Enrollment1` is declared outside the bracketed suquery, and used inside it. ```sql select cName from College C1 where not exists (select * from College C2 where C2.enrollment > C1.enrollment); ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q16"> college(CName, _State1, _Enrollment1), \+ ( college(_CName2, _State2, _Enrollment2), _Enrollment2 > _Enrollment1 ). </div> <div class="nb-cell markdown" name="md17"> ### Subqueries in the from clause > Students whose scaled GPA changes GPA by more than 1 ```sql select * from (select sID, sName, GPA, GPA*(sizeHS/1000.0) as scaledGPA from Student) G where abs(scaledGPA - GPA) > 1.0; ``` </div> <div class="nb-cell query" data-chunk="12" data-tabled="true" name="q17"> student(SID, SName, GPA, SizeHS), ScaledGPA is GPA * (SizeHS / 1000), _X is ScaledGPA - GPA, abs(_X, _Y), _Y > 1.0. </div> <div class="nb-cell markdown" name="md20"> ### Subqueries in the select clause > Colleges paired with the highest GPA of their applicants ```sql select distinct cName, state, (select distinct GPA from Apply, Student where College.cName = Apply.cName and Apply.sID = Student.sID and GPA >= all (select GPA from Student, Apply where Student.sID = Apply.sID and Apply.cName = College.cName)) as GPA from College; ``` </div> <div class="nb-cell query" data-chunk="21" data-tabled="true" name="q19"> distinct(CName, ( college(CName, State, _), apply(_SID1, CName, _, _), student(_SID1, _, GPA1, _), \+ ( student(_SID2, _, _GPA2, _), _GPA2 > GPA1 ) )). </div> <div class="nb-cell markdown" name="md42"> ## Inner Join > Student names and majors for which they've applied This same example is used for _natural joins_ since both student/4 and apply/5 have a column labelled sID. ```sql select distinct sName, major from Student inner join Apply on Student.sID = Apply.sID; /*** Rewrite using NATURAL JOIN ***/ select distinct sName, major from Student natural join Apply; ``` </div> <div class="nb-cell query" data-chunk="26" data-tabled="true" name="q38"> distinct([SName, Major], ( student(_SID, SName, _, _), apply(_SID, _, Major, _) )). </div> <div class="nb-cell markdown" name="md43"> ### Three-way Inner Join > Application info: ID, name, GPA, college name, enrollment ```sql select Apply.sID, sName, GPA, Apply.cName, enrollment from Apply, Student, College where Apply.sID = Student.sID and Apply.cName = College.cName; ``` </div> <div class="nb-cell query" data-chunk="30" data-tabled="true" name="q39"> distinct([SID, CName], ( apply(SID, CName, _, _), student(SID, SName, GPA, _), college(CName, _, Enrollment) )). </div> <div class="nb-cell markdown" name="md44"> ### Natural Join With Additional Conditions > Names and GPAs of students with sizeHS < 1000 applying to CS at Stanford ```sql select sName, GPA from Student natural join Apply where sizeHS < 1000 and major = 'CS' and cName = 'Stanford'; /*** USING clause considered safer ***/ select sName, GPA from Student join Apply using(sID) where sizeHS < 1000 and major = 'CS' and cName = 'Stanford'; ``` </div> <div class="nb-cell query" data-chunk="32" data-tabled="true" name="q40"> student(_SID, SName, GPA, _SizeHS), apply(_SID, 'Stanford', 'CS', _), _SizeHS < 1000. </div> <div class="nb-cell markdown" name="md21"> ## Left Outer Join In this example I want to also include those students who haven't applied for admission, so their values can't be read from the Apply table. SQL would allocate *null*, as I've done by allocating _null_ to CName and Major. _null_ isn't a standard Prolog builtin. > Include students who haven't applied anywhere ```sql select sName, sID, cName, major from Student left outer join Apply using(sID); ``` </div> <div class="nb-cell query" data-chunk="40" data-tabled="true" name="q20"> student(SID, SName, _GPA, _SizeHS), ( apply(SID, CName, Major, _Decision) ; \+apply(SID, CName, Major, _Decision), CName = null, Major = null ). </div> <div class="nb-cell markdown" name="md45"> ## Right Outer Join > Include applications without matching students ```sql insert into Apply values (321, 'MIT', 'history', 'N'); insert into Apply values (321, 'MIT', 'psychology', 'Y'); select sName, sID, cName, major from Student natural right outer join Apply; ``` Here I need some examples of student IDs in the apply table with no corresponding entries in the student table to be included as _null_. Adding them with [assertz(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=assertz/1) in a code block puts them into the database temporarily for the following query. To avoid a confusing mess of *or* `;` blocks, I've split these queries into `head :- body` stanzas. </div> <div class="nb-cell program" name="p3"> :- assertz(apply(321, 'MIT', 'history', 'N')). :- assertz(apply(321, 'MIT', 'psychology', 'Y')). student_apply(SName, SID, CName, Major) :- apply(SID, CName, Major, _Decision), student(SID, SName, _GPA, _SizeHS). student_apply(null, SID, CName, Major) :- apply(SID, CName, Major, _Decision), \+student(SID, _, _, _). </div> <div class="nb-cell query" data-chunk="29" data-tabled="true" name="q41"> student_apply(SName, SID, CName, Major). </div> <div class="nb-cell markdown" name="md22"> ## Full Outer Join > Include students who haven't applied anywhere and applications without matching students. ```sql select sName, sID, cName, major from Student full outer join Apply using(sID); ``` </div> <div class="nb-cell program" name="p2"> :- assertz(apply(321, 'MIT', 'history', 'N')). :- assertz(apply(321, 'MIT', 'psychology', 'Y')). % Natural join student_apply(SName, SID, CName, Major) :- student(SID, SName, _GPA, _SizeHS), apply(SID, CName, Major, _Decision). % Left outer join student_apply(SName, SID, null, null) :- student(SID, SName, _GPA, _SizeHS), \+apply(SID, _CName, _Major, _Decision). % Right outer join student_apply(null, SID, CName, Major) :- apply(SID, CName, Major, _Decision), \+student(SID, _SName, _GPA, _SizeHS). </div> <div class="nb-cell query" data-chunk="40" data-tabled="true" name="q21"> student_apply(SName, SID, CName, Major). </div> <div class="nb-cell markdown" name="md23"> ## Aggregation I initially used [findall(+Template, :Goal, -Bag)](https://www.swi-prolog.org/pldoc/doc_for?object=findall/3) combined with SWI Prolog's [List Manipulation](https://www.swi-prolog.org/pldoc/man?section=lists) predicates to solve these, but then figured out [aggregate_all(+Template, :Goal, -Result)](https://www.swi-prolog.org/pldoc/doc_for?object=aggregate_all/3) and [aggregate_all(+Template, +Discriminator, :Goal, -Result)](https://www.swi-prolog.org/pldoc/doc_for?object=aggregate_all/4) offered more elegant solutions. > Average GPA of all students. ```sql select avg(GPA) from Student; ``` </div> <div class="nb-cell query" name="q28"> aggregate_all((sum(GPA), count), student(_, _, GPA, _), (_T, _L)), Average is _T/_L. </div> <div class="nb-cell markdown" name="md24"> > Lowest GPA of students applying to CS ```sql select min(GPA) from Student, Apply where Student.sID = Apply.sID and major = 'CS'; ``` </div> <div class="nb-cell query" name="q29"> aggregate_all(min(GPA), (student(SID, _, GPA, _), apply(SID, _, 'CS', _)), Min). </div> <div class="nb-cell markdown" name="md25"> > Average GPA of students applying to CS. ```sql select avg(GPA) from Student where sID in (select sID from Apply where major = 'CS'); ``` The optional *discriminator argument* in [aggregate_all(+Template, +Discriminator, :Goal, -Result)](https://www.swi-prolog.org/pldoc/doc_for?object=aggregate_all/4) filters out duplicates, ensuring the average isn't distorted by including students who applied for CS at multiple colleges more than once. </div> <div class="nb-cell query" name="q31"> aggregate_all((sum(GPA), count), SID, (student(SID, _, GPA, _), apply(SID, _, 'CS', _)), (_T, _L)), Average is _T/_L. </div> <div class="nb-cell markdown" name="md26"> ```sql /************************************************************** Number of colleges bigger than 15,000 **************************************************************/ select count(*) from College where enrollment > 15000; ``` </div> <div class="nb-cell query" name="q30"> aggregate_all(count, (college(_, _, Enrollment), Enrollment > 15000), NumberBigColleges). </div> <div class="nb-cell markdown" name="md27"> > Number of students applying to Cornell. ```sql select Count(Distinct sID) from Apply where cName = 'Cornell'; ``` The trap here is that Craig applied to Cornell three times, Jay twice, and Amy once. So if duplicates aren't removed, the wrong answer of 6 will appear. </div> <div class="nb-cell query" name="q27"> aggregate_all(count, SID, apply(SID, 'Cornell', _, _), Count). </div> <div class="nb-cell markdown" name="md28"> ### group by queries Though SWI Prolog has [group_by(+By, +Template, :Goal, -Bag)](https://www.swi-prolog.org/pldoc/man?section=solutionsequences), I haven't used it since it uses the somewhat arcane bagof(+Template, :Goal, -Bag) carat notation. If the attribute to be _grouped by_ is uniquely listed in another table, it can be used to set each instance before calling _aggregate_all_ as below. > Number of colleges applied to by each student. ```sql select cName, count(distinct sID) from Apply group by cName; ``` </div> <div class="nb-cell query" data-chunk="20" data-tabled="true" name="q23"> college(CName, _, _), aggregate_all(count, _SID, apply(_SID, CName, _, _), Count). </div> <div class="nb-cell markdown" name="md30"> > Minimum + maximum GPAs of applicants to each college & major. ```sql select cName, major, min(GPA), max(GPA) from Student, Apply where Student.sID = Apply.sID group by cName, major; ``` </div> <div class="nb-cell query" data-chunk="30" data-tabled="true" name="q24"> distinct([CName, Major], ( apply(_, CName, Major, _), aggregate_all((min(_GPA), max(_GPA)), ( student(_SID, _, _GPA, _), apply(_SID, CName, Major, _) ), (Min, Max)) )). </div> <div class="nb-cell markdown" name="md31"> > Number of colleges applied to by each student, including 0 for those who applied nowhere. ```sql select Student.sID, count(distinct cName) from Student, Apply where Student.sID = Apply.sID group by Student.sID union select sID, 0 from Student where sID not in (select sID from Apply); ``` </div> <div class="nb-cell query" data-chunk="20" data-tabled="true" name="q25"> student(SID, SName, _, _), aggregate_all(count, _CName, apply(SID, _CName, _, _), Colleges). </div> <div class="nb-cell markdown" name="md29"> ### having > Colleges with fewer than 5 applications. ```sql select cName from Apply group by cName having count(*) < 5; ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q22"> college(CName, _, _), aggregate_all(count, apply(_, CName, _, _), Count), Count < 5. </div> <div class="nb-cell markdown" name="md32"> ```sql select cName from Apply group by cName having count(distinct sID) < 5; ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q26"> college(CName, _, _), aggregate_all(count, SID, apply(SID, CName, _, _), Count), Count < 5. </div> <div class="nb-cell markdown" name="md33"> ## null In SQL, _null_ is a special value which matches any type, whereas Prolog will barf if asked `GPA > 3.5` when GPA has been set to _null_, an _atom_ with no special meaning which throws a type error when used in `+Expr1 > +Expr2`. So as not to hardwire _null_ into my code, instead of `GPA \== null` I've used [number(@Term)](https://www.swi-prolog.org/pldoc/doc_for?object=number/1) as my _guard_ to filter out _nulls_. > All students with high GPA. ```sql select sID, sName, GPA from Student where GPA > 3.5; ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q32"> assertz(student(432, 'Kevin', null, 1500)), assertz(student(321, 'Lori', null, 2500)), student(SID, SName, GPA, _), number(GPA), GPA > 3.5. </div> <div class="nb-cell markdown" name="md34"> If I don't want to exclude unset GPAs (_nulls_, _ununified_ variables, whatever), I could use Prolog's `@Term1 @> @Term2` which uses the following precedence: _Variables < Numbers < Strings < Atoms < Compound Terms_ </div> <div class="nb-cell query" data-chunk="20" data-tabled="true" name="q33"> assertz(student(432, 'Kevin', null, 1500)), assertz(student(321, 'Lori', null, 2500)), student(SID, SName, GPA, _), GPA @> 3.5. </div> <div class="nb-cell markdown" name="md35"> ## Data Modification Note that to be able to modify data in the clausal store, I need to make a [dynamic :PredicateIndicator, ...](https://www.swi-prolog.org/pldoc/doc_for?object=(dynamic)/1) as in this example with: ```prolog :- dynamic college/3, student/4, apply/4. ``` A snag with Swish is that data modifications only take effect temporarily, and can't be carried from one example to the next (as far as I know), so I've had to combine all the examples. ### insert In above examples I've used [assertz(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=assertz/1) which adds a temporary entry which only lasts that query. Alternatively, I could have used [asserta(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=asserta/1). It makes little difference whether the new rows are added to the top or the bottom of the table in these examples. </div> <div class="nb-cell markdown" name="md36"> ```sql /************************************************************** Insert new college **************************************************************/ insert into College values ('Carnegie Mellon', 'PA', 11500); /************************************************************** Have all students who didn't apply anywhere apply to CS at Carnegie Mellon **************************************************************/ insert into Apply select sID, 'Carnegie Mellon', 'CS', null from Student where sID not in (select sID from Apply); /*** Admit to Carnegie Mellon EE all students who were turned down in EE elsewhere ***/ insert into Apply select sID, 'Carnegie Mellon', 'EE', 'Y' from Student where sID in (select sID from Apply where major = 'EE' and decision = 'N'); ``` </div> <div class="nb-cell query" data-chunk="10" data-tabled="true" name="q34"> assertz(college('Carnegie Mellon', 'PA', 11500)), findall(_X, ( student(_X, _, _, _), \+apply(_X, _, _, _) ), _Xs), forall(member(_X, _Xs), assertz(apply(_X, 'Carnegie Mellon', 'CS', null))), findall(_Y, apply(_Y, _, 'EE', 'N'), _Ys), forall(member(_Y, _Ys), assertz(apply(_Y, 'Carnegie Mellon', 'EE', 'Y'))), apply(SID, 'Carnegie Mellon', Major, Decision). </div> <div class="nb-cell markdown" name="md37"> ### delete Here we use [retract(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=retract/1). ```sql /************************************************************** Delete all students who applied to more than two different majors **************************************************************/ delete from Student where sID in (select sID from Apply group by sID having count(distinct major) > 2); ``` My initial version used the following faulty code (causing a bit of frustration). ```prolog forall(member(Student, Students), retract(Student)). ``` Many thanks to Jan Wielemaker for promptly explaining my problem was simple to solve: the exact element of Students (ie written out `student(SID, SName, GPA, SizeHS)`) is required by retract. </div> <div class="nb-cell query" data-chunk="30" data-tabled="true" name="q35"> forall(( student(SID, SName, GPA, SizeHS), aggregate_all(count, Major, apply(SID, _, Major, _), Count), Count > 2), retract(student(SID, SName, GPA, SizeHS))), student(SID, SName, GPA, SizeHS). </div> <div class="nb-cell markdown" name="md40"> ```sql /*** Delete same ones from Apply ***/ delete from Apply where sID in (select sID from Apply group by sID having count(distinct major) > 2); ``` </div> <div class="nb-cell query" data-chunk="26" data-tabled="true" name="q36"> forall((findall(SID, (apply(SID, _, _, _), aggregate_all(count, Major, apply(SID, _, Major, _), _Count), _Count > 2), _SIDs), (apply(SID, CName, Major, Decision), memberchk(SID, _SIDs))), retract(apply(SID, CName, Major, Decision))), apply(SID, CName, Major, Decision). </div> <div class="nb-cell markdown" name="md38"> ``` </div> <div class="nb-cell markdown" name="md39"> I've continued this exercise, doing the Social-Network examples from Stanford's MooC in [part 2](https://swish.swi-prolog.org/p/sql2prolog2.swinb) </div> </div>