SWISH -- SWI-Prolog for SHaring

<div class="notebook">

<div class="nb-cell markdown" name="md1">
# An introduction to Prolog for SQL programers

*By Robert Laing*

I've translated the introductory example in [Databases: Relational Databases and SQL](https://www.edx.org/course/databases-5-sql?utm_source=sailthru&amp;utm_medium=email&amp;utm_campaign=triggered_shareit) into Prolog since I've personally found thinking of Prolog as a relational database query language a handy way to get started.

I think I'm using Prolog synonymously with Datalog in these examples, but don't know Datalog that well.

The course used the following three tables to illustrate [the relational data model](http://i.stanford.edu/~ullman/focs/ch08.pdf) which written in Prolog look as follows:
</div>

<div class="nb-cell program" data-background="true" name="p1">
:- dynamic college/3, student/4, apply/4.

%! college(?CName:text, ?State:text, ?Enrollment:integer) is nondet
college('Stanford', 'CA', 15000).
college('Berkeley', 'CA', 36000).
college('MIT',      'MA', 10000).
college('Cornell',  'NY', 21000).

%! student(?SID:text, ?SName:text, ?GPA:float, ?SizeHS:integer) is nondet 
student(123, 'Amy',    3.9, 1000).
student(234, 'Bob',    3.6, 1500).
student(345, 'Craig',  3.5, 500).
student(456, 'Doris',  3.9, 1000).
student(567, 'Edward', 2.9, 2000).
student(678, 'Fay',    3.8, 200).
student(789, 'Gary',   3.4, 800).
student(987, 'Helen',  3.7, 800).
student(876, 'Irene',  3.9, 400).
student(765, 'Jay',    2.9, 1500).
student(654, 'Amy',    3.9, 1000).
student(543, 'Craig',  3.4, 2000).

%! apply(?SID:integer, ?CName:text, ?Major:text, ?Decision:text) is nondet
apply(123, 'Stanford', 'CS',             'Y').
apply(123, 'Stanford', 'EE',             'N').
apply(123, 'Berkeley', 'CS',             'Y').
apply(123, 'Cornell',  'EE',             'Y').
apply(234, 'Berkeley', 'biology',        'N').
apply(345, 'MIT',      'bioengineering', 'Y').
apply(345, 'Cornell',  'bioengineering', 'N').
apply(345, 'Cornell',  'CS',             'Y').
apply(345, 'Cornell',  'EE',             'N').
apply(678, 'Stanford', 'history',        'Y').
apply(987, 'Stanford', 'CS',             'Y').
apply(987, 'Berkeley', 'CS',             'Y').
apply(876, 'Stanford', 'CS',             'N').
apply(876, 'MIT',      'biology',        'Y').
apply(876, 'MIT',      'marine biology', 'N').
apply(765, 'Stanford', 'history',        'Y').
apply(765, 'Cornell',  'history',        'N').
apply(765, 'Cornell',  'psychology',     'Y').
apply(543, 'MIT',      'CS',             'N').
</div>

<div class="nb-cell markdown" name="md46">
With Swish, the above data is loaded and available to all following queries (including ones after subsequent program cells) because I selected the *global icon*. If you are doing these examples in the swipl repl, you could use [consult(:File)](https://www.swi-prolog.org/pldoc/doc_for?object=consult/1) to load the data.

I discovered that by clicking on the spanner icon and selecting table, Swish gives me a nice "relational" result. In the swipl repl, you would press `;` (In prolog commas are *and* and semicolons are *or*).
</div>

<div class="nb-cell query" data-chunk="4" data-tabled="true" name="q42">
college(CName, State, Enrollment).
</div>

<div class="nb-cell query" data-chunk="12" data-tabled="true" name="q44">
student(SID, SName, GPA, SizeHS).
</div>

<div class="nb-cell query" data-chunk="19" data-tabled="true" name="q43">
apply(SID, CName, Major, Decision).
</div>

<div class="nb-cell markdown" name="md2">
### Basic selection

&gt; IDs, names, and GPAs of students with GPA &gt; 3.6

```sql
select sID, sName, GPA
from Student
where GPA &gt; 3.6;
```
</div>

<div class="nb-cell query" data-chunk="7" data-tabled="true" name="q1">
student(SID, SName, GPA, _SizeHS), GPA &gt; 3.6.
</div>

<div class="nb-cell markdown" name="md3">
&gt; Same query without GPA

```sql
select sID, sName
from Student
where GPA &gt; 3.6;
```

The second example only differs from the first in that the GPA attribute isn't wanted in the output, but is still needed as a query filter. 

A trick here is using an underscore followed by a variable name (as opposed to just an underscore for "don't care variables"). Even though `_GPA` isn't output because of its leading underscore, it still links the third variable in the `student/4` clause to greater-than condition.
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q3">
student(SID, SName, _GPA, _SizeHS), _GPA &gt; 3.6.
</div>

<div class="nb-cell markdown" name="md4">
### Table joins

&gt; Student names and majors for which they've applied

```sql
select sName, major
from Student, Apply
where Student.sID = Apply.sID;
```
A nifty trick in Prolog is equality can be checked as in `Student.sID = Apply.sID` by simply placing the same variable name in both positions (`_ID` in the example below). Again if I don't want the "joining" variable in the output, I provide a leading underscore.
</div>

<div class="nb-cell query" data-chunk="20" data-tabled="true" name="q2">
student(_ID, SName, _GPA, _SizeHS), 
apply(_ID, _CName, Major, _Decision).
</div>

<div class="nb-cell markdown" name="md5">
A common gripe of maths purists is the poor support for [the set data model](http://i.stanford.edu/~ullman/focs/ch07.pdf) in programing languages. One of the basics of sets is no duplicate elements, which is surprisingly tricky as translating the following SQL example shows:

&gt; Same query with Distinct

```sql
select distinct sName, major
from Student, Apply
where Student.sID = Apply.sID;
```
SWI Prolog has a [distinct(?Witness, :Goal)](https://www.swi-prolog.org/pldoc/doc_for?object=distinct/2) predicate, something I only became aware of after the first version of this tutorial. It is part of the [solution sequences](https://www.swi-prolog.org/pldoc/man?section=solutionsequences) library.
</div>

<div class="nb-cell query" data-chunk="20" data-tabled="true" name="q4">
distinct([SName, Major], 
         (    student(_ID, SName, _GPA, _SizeHS), 
              apply(_ID, _CName, Major, _Decision)    ) ).
</div>

<div class="nb-cell markdown" name="md6">
The following SQL example uses more convoluted `where...` lines, which is where Prolog, as a language niched at predicate logic, shines.

&gt; Names and GPAs of students with sizeHS &lt; 1000 applying to CS at Stanford, and the application decision

```sql
select sname, GPA, decision
from Student, Apply
where Student.sID = Apply.sID
  and sizeHS &lt; 1000 and major = 'CS' and cname = 'Stanford';
```
</div>

<div class="nb-cell query" data-chunk="7" data-tabled="true" name="q5">
student(_ID, SName, GPA, _SizeHS), 
apply(_ID, 'Stanford', 'CS', Decision), 
_SizeHS &lt; 1000.
</div>

<div class="nb-cell markdown" name="md7">
&gt; All large campuses with CS applicants

```sql
select distinct College.cName
from College, Apply
where College.cName = Apply.cName
  and enrollment &gt; 20000 and major = 'CS';
```
</div>

<div class="nb-cell query" data-chunk="5" data-tabled="true" name="q6">
distinct(CName, 
         (    college(CName, _State, _Enrollment), 
              apply(_SID, CName, 'CS', _Decision), 
              _Enrollment &gt; 20000    
         )).
</div>

<div class="nb-cell markdown" name="md8">
### order by

Here we us [order_by(+Spec, :Goal)](https://www.swi-prolog.org/pldoc/doc_for?object=order_by/2), which I only became aware of after using [sort(+Key, +Order, +List, -Sorted)](https://www.swi-prolog.org/pldoc/doc_for?object=sort/4) where the +Order allows the output to be ascending or descending.

&gt; Application information, sort by decreasing GPA

```sql
select Student.sID, sName, GPA, Apply.cName, enrollment
from Student, College, Apply
where Apply.sID = Student.sID and Apply.cName = College.cName
order by GPA desc;
```
</div>

<div class="nb-cell query" data-chunk="21" data-tabled="true" name="q7">
order_by([desc(GPA)], 
         (    student(ID, SName, GPA, _SizeHS),
              college(CName, _State, Enrollment),
              apply(ID, CName, _Major, _Decision)    )).
</div>

<div class="nb-cell markdown" name="md9">
Next challenge to is to create a "tie breaker" by sorting matching GPAs by ascending enrollments.

&gt; Application information, sort by decreasing GPA Then by increasing enrollment

```sql
select Student.sID, sName, GPA, Apply.cName, enrollment
from Student, College, Apply
where Apply.sID = Student.sID and Apply.cName = College.cName
order by GPA desc, enrollment;
```
</div>

<div class="nb-cell query" data-chunk="31" data-tabled="true" name="q8">
order_by([desc(GPA), asc(Enrollment)],
         (    student(ID, SName, GPA, _SizeHS),
              college(CName, _State, Enrollment),
              apply(ID, CName, _Major, _Decision)    )).
</div>

<div class="nb-cell markdown" name="md10">
### Searching for substrings

&gt; Applicants to bio majors

```sql
select sID, major
from Apply
where major like '%bio%';
```
</div>

<div class="nb-cell query" data-chunk="11" data-tabled="true" name="q9">
apply(SID, _CName, Major, _Decision),
sub_atom(Major, _, _, _, 'bio').
</div>

<div class="nb-cell markdown" name="md11">
### Arithmetic Functions

&gt; Add scaled GPA based on sizeHS

```sql
select sID, sName, GPA, sizeHS, GPA*(sizeHS/1000.0) as scaledGPA
from Student;
```
</div>

<div class="nb-cell query" data-chunk="32" data-tabled="true" name="q10">
student(SID, SName, GPA, SizeHS),
ScaledGPA is GPA * (SizeHS / 1000).
</div>

<div class="nb-cell markdown" name="md12">
### Self joins

Whereas in SQL you give each instance of a relation distinguishing alias -- S1 and S2 in this example for two instances of the Student table -- while attributes retain their names, in Prolog the relations retain their names while the attributes are differentiated by say SID1 vs SID2 etc, unless we want to match them for a join, as with GPA here.

&gt; Pairs of students with same GPA

```sql
select S1.sID, S1.sName, S1.GPA, S2.sID, S2.sName, S2.GPA
from Student S1, Student S2
where S1.GPA = S2.GPA and S1.sID &lt; S2.sID;
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q11">
student(SID1, SName1, GPA, _SizeHS1),
student(SID2, SName2, GPA, _SizeHS2),
SID1 &lt; SID2.
</div>

<div class="nb-cell markdown" name="md13">
### Union

Not only do the maths symbols for _union_ *A ∪ B* and _or_ *A ∨ B* look related, they are related. As are the symbols for _intersection_ *A ∩ B* and _and_ *A ∧ B*.

&gt; List of college names and student names

```sql
select cName as name from College
union
select sName as name from Student;
```
</div>

<div class="nb-cell query" data-chunk="21" data-tabled="true" name="q12">
college(Name, _, _) ; student(_, Name, _, _).
</div>

<div class="nb-cell markdown" name="md14">
### Intersection

&gt; IDs of students who applied to both CS and EE

```sql
select sID from Apply where major = 'CS'
intersect
select sID from Apply where major = 'EE';
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q13">
apply(SID, _, 'CS', _) , apply(SID, _, 'EE', _).
</div>

<div class="nb-cell markdown" name="md15">
SID 123 appears four times because Amy made four college applications, twice for CS and twice for EE. The duplicates could be removed like this:
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q14">
distinct(SID, 
        (    apply(SID, _, 'CS', _) , apply(SID, _, 'EE', _)    )).
</div>

<div class="nb-cell markdown" name="md16">
### Except

&gt; IDs of students who applied to CS but not EE

```sql
select sID from Apply where major = 'CS'
except
select sID from Apply where major = 'EE';
```
</div>

<div class="nb-cell query" data-chunk="20" data-tabled="true" name="q15">
apply(SID, _, 'CS', _), \+apply(SID, _, 'EE', _).
</div>

<div class="nb-cell markdown" name="md18">
## Subqueries

A key advantage Prolog/Datalog has over SQL is making it easier to break big problems into smaller components. SQL scripts have a horrible habit of growing into nested `select .... from .... where` blocks. 

As the following examples show, each of SQL's three basic clauses in turn can contain any number of `select .... from .... where` queries, each of which in turn can contain more deeply nested queries, creating "write once, read never" code.

### Subqueries in the where clause

These examples use SQL's [exists](https://www.postgresql.org/docs/12/functions-subquery.html), in, not in, any/some, and all builtins. The Prolog equivalent would typically be to translate the SQL subquery using [findall(+Template, :Goal, -List)](https://www.swi-prolog.org/pldoc/doc_for?object=findall/3), and then using [memberchk(?Elem, +List)](https://www.swi-prolog.org/pldoc/doc_for?object=memberchk/2).

&gt; IDs and names of students applying to CS

```sql
select sID, sName
from Student
where sID in (select sID from Apply where major = 'CS');
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q18">
findall(_ID, apply(_ID, _, 'CS', _), _L),
student(SID, SName, _, _),
memberchk(SID, _L).
</div>

<div class="nb-cell markdown" name="md41">
&gt; Colleges such that some other college is in the same state

```sql
select cName, state
from College C1
where exists (select * from College C2
              where C2.state = C1.state and C2.cName &lt;&gt; C1.cName);
```
</div>

<div class="nb-cell query" data-chunk="6" data-tabled="true" name="q37">
findall(_CName1, (    college(_CName1, _State, _), 
                      college(_CName2, _State, _),
                      _CName1 \== _CName2
                 ), _L),
college(CName, State, _),
memberchk(CName, _L).
</div>

<div class="nb-cell markdown" name="md19">
&gt; Biggest college

This example has a _correlated reference_ -- C1 declared outside of the subquery is used in the subquery. Note below, this also works in Prolog, where `_Enrollment1` is declared outside the bracketed suquery, and used inside it.

```sql
select cName
from College C1
where not exists (select * from College C2
                  where C2.enrollment &gt; C1.enrollment);
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q16">
college(CName, _State1, _Enrollment1),
\+ (    college(_CName2, _State2, _Enrollment2),
        _Enrollment2 &gt; _Enrollment1    ).
</div>

<div class="nb-cell markdown" name="md17">
### Subqueries in the from clause

&gt; Students whose scaled GPA changes GPA by more than 1

```sql
select *
from (select sID, sName, GPA, GPA*(sizeHS/1000.0) as scaledGPA
      from Student) G
where abs(scaledGPA - GPA) &gt; 1.0;
```
</div>

<div class="nb-cell query" data-chunk="12" data-tabled="true" name="q17">
student(SID, SName, GPA, SizeHS),
ScaledGPA is GPA * (SizeHS / 1000),
_X is ScaledGPA - GPA,
abs(_X, _Y),
_Y &gt; 1.0.
</div>

<div class="nb-cell markdown" name="md20">
### Subqueries in the select clause

&gt; Colleges paired with the highest GPA of their applicants

```sql
select distinct cName, state,
  (select distinct GPA
   from Apply, Student
   where College.cName = Apply.cName
     and Apply.sID = Student.sID
     and GPA &gt;= all
           (select GPA from Student, Apply
            where Student.sID = Apply.sID
              and Apply.cName = College.cName)) as GPA
from College;
```
</div>

<div class="nb-cell query" data-chunk="21" data-tabled="true" name="q19">
distinct(CName, (   college(CName, State, _),
                    apply(_SID1, CName, _, _),
                    student(_SID1, _, GPA1, _),
                    \+ (    student(_SID2, _, _GPA2, _),
                           _GPA2 &gt; GPA1
                       )
                )).
</div>

<div class="nb-cell markdown" name="md42">
## Inner Join

&gt; Student names and majors for which they've applied

This same example is used for _natural joins_ since both student/4 and apply/5 have a column labelled sID.

```sql
select distinct sName, major
from Student inner join Apply
on Student.sID = Apply.sID;

/*** Rewrite using NATURAL JOIN ***/

select distinct sName, major
from Student natural join Apply;
```
</div>

<div class="nb-cell query" data-chunk="26" data-tabled="true" name="q38">
distinct([SName, Major], 
         (    student(_SID, SName, _, _), 
              apply(_SID, _, Major, _)
          )).
</div>

<div class="nb-cell markdown" name="md43">
### Three-way Inner Join

&gt; Application info: ID, name, GPA, college name, enrollment

```sql
select Apply.sID, sName, GPA, Apply.cName, enrollment
from Apply, Student, College
where Apply.sID = Student.sID and Apply.cName = College.cName;
```
</div>

<div class="nb-cell query" data-chunk="30" data-tabled="true" name="q39">
distinct([SID, CName],
         (    apply(SID, CName, _, _),
              student(SID, SName, GPA, _),
              college(CName, _, Enrollment)
         )).
</div>

<div class="nb-cell markdown" name="md44">
### Natural Join With Additional Conditions

&gt; Names and GPAs of students with sizeHS &lt; 1000 applying to CS at Stanford

```sql
select sName, GPA
from Student natural join Apply
where sizeHS &lt; 1000 and major = 'CS' and cName = 'Stanford';

/*** USING clause considered safer ***/

select sName, GPA
from Student join Apply using(sID)
where sizeHS &lt; 1000 and major = 'CS' and cName = 'Stanford';
```
</div>

<div class="nb-cell query" data-chunk="32" data-tabled="true" name="q40">
student(_SID, SName, GPA, _SizeHS),
apply(_SID, 'Stanford', 'CS', _),
_SizeHS &lt; 1000.
</div>

<div class="nb-cell markdown" name="md21">
## Left Outer Join

In this example I want to also include those students who haven't applied for admission, so their values can't be read from the Apply table.
SQL would allocate *null*, as I've done by allocating _null_ to CName and Major. _null_ isn't a standard Prolog builtin.

&gt; Include students who haven't applied anywhere

```sql
select sName, sID, cName, major
from Student left outer join Apply using(sID);
```
</div>

<div class="nb-cell query" data-chunk="40" data-tabled="true" name="q20">
student(SID, SName, _GPA, _SizeHS),
(   apply(SID, CName, Major, _Decision)
    ;   
    \+apply(SID, CName, Major, _Decision),
    CName = null, Major = null
).
</div>

<div class="nb-cell markdown" name="md45">
## Right Outer Join

&gt; Include applications without matching students

```sql
insert into Apply values (321, 'MIT', 'history', 'N');
insert into Apply values (321, 'MIT', 'psychology', 'Y');

select sName, sID, cName, major
from Student natural right outer join Apply;
```
Here I need some examples of student IDs in the apply table with no corresponding entries in the student table to be included as _null_.
Adding them with [assertz(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=assertz/1) in a code block puts them into the database temporarily for the following query.

To avoid a confusing mess of *or* `;` blocks, I've split these queries into `head :- body` stanzas.
</div>

<div class="nb-cell program" name="p3">
:- assertz(apply(321, 'MIT', 'history', 'N')).
:- assertz(apply(321, 'MIT', 'psychology', 'Y')).

student_apply(SName, SID, CName, Major) :-
    apply(SID, CName, Major, _Decision),
    student(SID, SName, _GPA, _SizeHS).

student_apply(null, SID, CName, Major) :-
    apply(SID, CName, Major, _Decision),
    \+student(SID, _, _, _).
</div>

<div class="nb-cell query" data-chunk="29" data-tabled="true" name="q41">
student_apply(SName, SID, CName, Major).
</div>

<div class="nb-cell markdown" name="md22">
## Full Outer Join

&gt; Include students who haven't applied anywhere and applications without matching students.

```sql
select sName, sID, cName, major
from Student full outer join Apply using(sID);
```
</div>

<div class="nb-cell program" name="p2">
:- assertz(apply(321, 'MIT', 'history', 'N')).
:- assertz(apply(321, 'MIT', 'psychology', 'Y')).

% Natural join
student_apply(SName, SID, CName, Major) :-
    student(SID, SName, _GPA, _SizeHS),
    apply(SID, CName, Major, _Decision).

% Left outer join
student_apply(SName, SID, null, null) :-
    student(SID, SName, _GPA, _SizeHS),
    \+apply(SID, _CName, _Major, _Decision).

% Right outer join
student_apply(null, SID, CName, Major) :-
    apply(SID, CName, Major, _Decision),
    \+student(SID, _SName, _GPA, _SizeHS).
</div>

<div class="nb-cell query" data-chunk="40" data-tabled="true" name="q21">
student_apply(SName, SID, CName, Major).
</div>

<div class="nb-cell markdown" name="md23">
## Aggregation

I initially used [findall(+Template, :Goal, -Bag)](https://www.swi-prolog.org/pldoc/doc_for?object=findall/3) combined with SWI Prolog's [List Manipulation](https://www.swi-prolog.org/pldoc/man?section=lists) predicates to solve these, but then figured out [aggregate_all(+Template, :Goal, -Result)](https://www.swi-prolog.org/pldoc/doc_for?object=aggregate_all/3) and [aggregate_all(+Template, +Discriminator, :Goal, -Result)](https://www.swi-prolog.org/pldoc/doc_for?object=aggregate_all/4) offered more elegant solutions.

&gt; Average GPA of all students.

```sql
select avg(GPA)
from Student;
```
</div>

<div class="nb-cell query" name="q28">
aggregate_all((sum(GPA), count), student(_, _, GPA, _), (_T, _L)),
Average is _T/_L.
</div>

<div class="nb-cell markdown" name="md24">
&gt; Lowest GPA of students applying to CS

```sql
select min(GPA)
from Student, Apply
where Student.sID = Apply.sID and major = 'CS';
```
</div>

<div class="nb-cell query" name="q29">
aggregate_all(min(GPA), 
              (student(SID, _, GPA, _), apply(SID, _, 'CS', _)), 
              Min).
</div>

<div class="nb-cell markdown" name="md25">
&gt; Average GPA of students applying to CS.

```sql
select avg(GPA)
from Student
where sID in (select sID from Apply where major = 'CS');
```
The optional *discriminator argument* in [aggregate_all(+Template, +Discriminator, :Goal, -Result)](https://www.swi-prolog.org/pldoc/doc_for?object=aggregate_all/4) filters out duplicates, ensuring the average isn't distorted by including students who applied for CS at multiple colleges more than once.
</div>

<div class="nb-cell query" name="q31">
aggregate_all((sum(GPA), count), SID,
              (student(SID, _, GPA, _), apply(SID, _, 'CS', _)), 
              (_T, _L)),
Average is _T/_L.
</div>

<div class="nb-cell markdown" name="md26">
```sql
/**************************************************************
  Number of colleges bigger than 15,000
**************************************************************/

select count(*)
from College
where enrollment &gt; 15000;
```
</div>

<div class="nb-cell query" name="q30">
aggregate_all(count, 
              (college(_, _, Enrollment), Enrollment &gt; 15000), 
              NumberBigColleges).
</div>

<div class="nb-cell markdown" name="md27">
&gt;  Number of students applying to Cornell.

```sql
select Count(Distinct sID)
from Apply
where cName = 'Cornell';
```

The trap here is that Craig applied to Cornell three times, Jay twice, and Amy once. So if duplicates aren't removed, the wrong answer of 6 will appear.
</div>

<div class="nb-cell query" name="q27">
aggregate_all(count, SID, apply(SID, 'Cornell', _, _), Count).
</div>

<div class="nb-cell markdown" name="md28">
### group by queries

Though SWI Prolog has [group_by(+By, +Template, :Goal, -Bag)](https://www.swi-prolog.org/pldoc/man?section=solutionsequences), I haven't used it since it uses the somewhat arcane bagof(+Template, :Goal, -Bag) carat notation. If the attribute to be _grouped by_ is uniquely listed in another table, it can be used to set each instance before calling _aggregate_all_ as below.

&gt; Number of colleges applied to by each student.

```sql
select cName, count(distinct sID)
from Apply
group by cName;
```
</div>

<div class="nb-cell query" data-chunk="20" data-tabled="true" name="q23">
college(CName, _, _),
aggregate_all(count, _SID, apply(_SID, CName, _, _), Count).
</div>

<div class="nb-cell markdown" name="md30">
&gt;  Minimum + maximum GPAs of applicants to each college &amp; major.

```sql
select cName, major, min(GPA), max(GPA)
from Student, Apply
where Student.sID = Apply.sID
group by cName, major;
```
</div>

<div class="nb-cell query" data-chunk="30" data-tabled="true" name="q24">
distinct([CName, Major],
         (    apply(_, CName, Major, _),
              aggregate_all((min(_GPA), max(_GPA)),
                           (    student(_SID, _, _GPA, _), 
                                apply(_SID, CName, Major, _)    ), 
                           (Min, Max))
         )).
</div>

<div class="nb-cell markdown" name="md31">
&gt; Number of colleges applied to by each student, including 0 for those who applied nowhere.

```sql
select Student.sID, count(distinct cName)
from Student, Apply
where Student.sID = Apply.sID
group by Student.sID
union
select sID, 0
from Student
where sID not in (select sID from Apply);
```
</div>

<div class="nb-cell query" data-chunk="20" data-tabled="true" name="q25">
student(SID, SName, _, _),
aggregate_all(count, _CName, apply(SID, _CName, _, _), Colleges).
</div>

<div class="nb-cell markdown" name="md29">
### having

&gt; Colleges with fewer than 5 applications.

```sql
select cName
from Apply
group by cName
having count(*) &lt; 5;
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q22">
college(CName, _, _),
aggregate_all(count, apply(_, CName, _, _), Count),
Count &lt; 5.
</div>

<div class="nb-cell markdown" name="md32">
```sql
select cName
from Apply
group by cName
having count(distinct sID) &lt; 5;
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q26">
college(CName, _, _),
aggregate_all(count, SID, apply(SID, CName, _, _), Count),
Count &lt; 5.
</div>

<div class="nb-cell markdown" name="md33">
## null

In SQL, _null_ is a special value which matches any type, whereas Prolog will barf if asked `GPA &gt; 3.5` when GPA has been set to _null_, an _atom_ with no special meaning which throws a type error when used in `+Expr1 &gt; +Expr2`.

So as not to hardwire _null_ into my code, instead of `GPA \== null` I've used [number(@Term)](https://www.swi-prolog.org/pldoc/doc_for?object=number/1) as my _guard_ to filter out _nulls_.

&gt; All students with high GPA.

```sql
select sID, sName, GPA
from Student
where GPA &gt; 3.5;
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q32">
assertz(student(432, 'Kevin', null, 1500)),
assertz(student(321, 'Lori', null, 2500)),
student(SID, SName, GPA, _), number(GPA), GPA &gt; 3.5.
</div>

<div class="nb-cell markdown" name="md34">
If I don't want to exclude unset GPAs (_nulls_, _ununified_ variables, whatever), I could use Prolog's `@Term1 @&gt; @Term2` which uses the following precedence:

_Variables &lt; Numbers &lt; Strings &lt; Atoms &lt; Compound Terms_
</div>

<div class="nb-cell query" data-chunk="20" data-tabled="true" name="q33">
assertz(student(432, 'Kevin', null, 1500)),
assertz(student(321, 'Lori', null, 2500)),
student(SID, SName, GPA, _), GPA @&gt; 3.5.
</div>

<div class="nb-cell markdown" name="md35">
## Data Modification

Note that to be able to modify data in the clausal store, I need to make a [dynamic :PredicateIndicator, ...](https://www.swi-prolog.org/pldoc/doc_for?object=(dynamic)/1) as in this example with:

```prolog
:- dynamic college/3, student/4, apply/4.
```

A snag with Swish is that data modifications only take effect temporarily, and can't be carried from one example to the next (as far as I know), so I've had to combine all the examples.

### insert

In above examples I've used [assertz(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=assertz/1) which adds a temporary entry which only lasts that query. Alternatively, I could have used [asserta(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=asserta/1). It makes little difference whether the new rows are added to the top or the bottom of the table in these examples.
</div>

<div class="nb-cell markdown" name="md36">
```sql
/**************************************************************
  Insert new college
**************************************************************/

insert into College values ('Carnegie Mellon', 'PA', 11500);

/**************************************************************
  Have all students who didn't apply anywhere apply to
  CS at Carnegie Mellon
**************************************************************/

insert into Apply
  select sID, 'Carnegie Mellon', 'CS', null
  from Student
  where sID not in (select sID from Apply);

/*** Admit to Carnegie Mellon EE all students who were turned down
     in EE elsewhere ***/

insert into Apply
  select sID, 'Carnegie Mellon', 'EE', 'Y'
  from Student
  where sID in (select sID from Apply
                where major = 'EE' and decision = 'N');
```
</div>

<div class="nb-cell query" data-chunk="10" data-tabled="true" name="q34">
assertz(college('Carnegie Mellon', 'PA', 11500)),
findall(_X, (    student(_X, _, _, _),
                 \+apply(_X, _, _, _)
            ), _Xs),
forall(member(_X, _Xs), 
       assertz(apply(_X, 'Carnegie Mellon', 'CS', null))),
findall(_Y, apply(_Y, _, 'EE', 'N'), _Ys),
forall(member(_Y, _Ys),
       assertz(apply(_Y, 'Carnegie Mellon', 'EE', 'Y'))),
apply(SID, 'Carnegie Mellon', Major, Decision).
</div>

<div class="nb-cell markdown" name="md37">
### delete

Here we use [retract(+Term)](https://www.swi-prolog.org/pldoc/doc_for?object=retract/1).

```sql
/**************************************************************
  Delete all students who applied to more than two different
  majors
**************************************************************/

delete from Student
where sID in
  (select sID
   from Apply
   group by sID
   having count(distinct major) &gt; 2);
```

My initial version used the following faulty code (causing a bit of frustration).

```prolog
forall(member(Student, Students), retract(Student)).
```
Many thanks to Jan Wielemaker for promptly explaining my problem was simple to solve: the exact element of Students (ie written out `student(SID, SName, GPA, SizeHS)`) is required by retract.
</div>

<div class="nb-cell query" data-chunk="30" data-tabled="true" name="q35">
forall(( student(SID, SName, GPA, SizeHS),
           aggregate_all(count, Major, apply(SID, _, Major, _),
                         Count),
           Count &gt; 2),
       retract(student(SID, SName, GPA, SizeHS))),
student(SID, SName, GPA, SizeHS).
</div>

<div class="nb-cell markdown" name="md40">
```sql
/*** Delete same ones from Apply ***/
delete from Apply
where sID in
  (select sID
   from Apply
   group by sID
   having count(distinct major) &gt; 2);
```
</div>

<div class="nb-cell query" data-chunk="26" data-tabled="true" name="q36">
forall((findall(SID, (apply(SID, _, _, _),
                        aggregate_all(count, Major, 
                                      apply(SID, _, Major, _), _Count),
                        _Count &gt; 2), _SIDs),
       (apply(SID, CName, Major, Decision), memberchk(SID, _SIDs))),
       retract(apply(SID, CName, Major, Decision))),
apply(SID, CName, Major, Decision).
</div>

<div class="nb-cell markdown" name="md38">
```
</div>

<div class="nb-cell markdown" name="md39">
I've continued this exercise, doing the Social-Network examples from Stanford's MooC in [part 2](https://swish.swi-prolog.org/p/sql2prolog2.swinb)
</div>

</div>