Significantly refactor and simplify flattening transformation, unify …

…handling of builtins and functional predicates (#23) ## Background ### Functional predicates I originally introduced **functional predicates** with the observation that it's pretty nice to be able to write ``` size is 3. color 1 is red. color 2 is blue. color 3 is green. color 4 is orange. a (color N) :- b N, N < size. ``` where the last rule is a shorthand for ``` a Color :- b N, size is Size, N < Size, color N is Color. ``` Expanding out this shorthand is done in a step called the _flattening transformation_. ### Built-in operatons I also originally introduced **built-in operations** in a way that had a complicated interaction with the term-matching infrastructure, so that, for example, matching the term `3` against the pattern `succ X` with `succ` registered as the `NAT_SUCC` operator would result in X being bound to 2. This introduced complexity into a pretty core part of the system, **and** it wasn't clear what should happen if you try to write things like `succ "Hello world"`. One correct, well-understood, and uniform way to handle this sort of operation is by treating the successor operator as an infinite relation of the form `succ X is Y`, with the following entries: ``` succ 0 is 1. succ 1 is 2. succ 2 is 3. ... ``` This PR unifies the way both functional predicates and built-in relations get treated by the static checker and the flattening transformation. ## Built-ins are now uniformly treated as relations Without this PR, Dusa doesn't allow you to write built-in operations as relations. The following isn't allowed, for example: ``` #builtin NAT_SUCC succ a 3. b Y :- a X, succ X is Y. ``` In Dusa 0.0.14 and below, built-ins can only be referenced in their functional form, not their relational form, even though the program was compiling everything into the relational form. This is fixed, and under this PR the program above is accepted. ## Functional predicates and builts-in are now uniformly translated to relations in the flattening transformation Functional predicates and built-ins are now turned into additional predicates by the flattening transformation, a simple recursive process that mimics instruction selection in a compiler: ``` #demand f X Y, h (plus X Y). --> #demand f X Y, plus X Y is Z, h Z ``` These effects stack: ``` #demand f X Y, g (plus (plus X X) (plus X Y)). --> #demand f X Y, plus X X is X2, plus X Y is Z, plus X2 Z is G, g G. ``` This transformation is exactly the same whether `plus` is a declared relation or the builtin `INT_PLUS`. ## Modes for functional predicates are now much more generic, but still have a special case for inequality Modes for all built-ins, including the syntactically-supported ones like equality and inequality, are now uniformly handled with a mode system. This mode system has one quirk beyond the usual "input"/"output" thing, which is that "wildcard" is allowed as an input mode to handle built-ins that work like inequaltiy. Consider the following rule: ``` a Ys :- b Xs, cons 4 Ys != Xs. ``` This rule will give the following error: > The built-in relation NOT_EQUAL was given 2 arguments, and the argument in position 1 contains variables not bound by previous premises. This builtin does not support that mode of operation. Intuitively, Dusa can't handle this, because there are always going to be countably infinite `Ys` such that `cons 4 Ys != Xs`. **However** for practical logic programming it's _really really useful_ to be able to say ``` a :- b Xs, cons 4 _ != Xs. ``` Which, logically, translates to "If b Xs, and not (Exists Ys. cons 4 Ys == Xs), then a" This is admittedly irregular with respect to everything else in the language! But computationally it's quite reasonable, implementation-wise it's not particularly complex, the logical meaning is entirely decidable, and there are many cases where you avoid a lot of sad verboseness if this doesn't exist. The "wildcard" mode exists to support this specific use-case: a "wildcard" moded argument is one that isn't grounded by its premises, but any non-ground portions are limited to wildcard variables `_`. ## Backwards incompatibility This PR simplifies the existing flattening transformation, but this means that some previously-accepted programs now must be rejected. In particular, this PR disables a common idiom _that was used in one of the default sample programs_ (the graph generation example). ``` #builtin NAT_SUCC s vertex 6. vertex N :- vertex (s N). ``` This is no longer supported, and now returns this error > Because s is the built-in relation NAT_SUCC, for it to be used like a function symbol, all the arguments must be grounded by a previous premise. If you want to use s with a different mode, write it out as a separate premise, like 's * is *'. This makes `NAT_SUCC` a lot less useful in general - you can still write the program, following the advice of the error message, like this: ``` #builtin NAT_SUCC s vertex 6. vertex N :- vertex M, s N is M. ``` A better idiom in Rob's opinion now that this functionality doesn't exist, is this one: ``` #builtin INT_MINUS minus vertex 6. vertex (minus N 1) :- vertex N, N > 0. ``` (Chris disagrees, fwiw, and still thinks the program using NAT_SUCC is better. In any case we're not losing expressivity, both are possible.) ### Rationale for backwards incompatibility I believe this backwards incompatibility is acceptable, at least in the medium term, because it lets us uniformly turn all relations and premises into *separate premises* with a very uniform translation. The uniform translation, on the now-rejected program, looks like this: ``` #builtin NAT_SUCC s vertex 6. vertex N :- s N is M, vertex M. ``` ...and that clearly doesn't work, as successor with no ground inputs has an infinite number of possible ways of matching (N = 122, M = 123, or N = 123, M = 124, or N = 124, M = 125, or N = 125, M = 126, and so on). ### Alternatives to introducing backwards compatibility This was previously supported by a significantly more complex and harder-to-motivate flattening transformation that "notices" that `(s N)` is not ground in the first argument and instead translates the problematic program above like this: ``` #builtin NAT_SUCC s vertex 6. vertex N :- vertex M, s N is M. ``` I put a fair amount of work into that kind of before-and-after translation... it was very complex. I think disabling this functionality is worthwhile in the name of simplicity.
robsimmons · Jun 13, 2024 · 85f7369 · 85f7369
1 parent 14ac885
commit 85f7369
Show file tree

Hide file tree

Showing 35 changed files with 1,548 additions and 1,583 deletions.
diff --git a/docs/src/content/docs/docs/language/builtin.md b/docs/src/content/docs/docs/language/builtin.md
@@ -1,19 +1,19 @@
 ---
-title: Built-in functions
+title: Built-in relations
 ---
 
 On their own, built-in numbers and strings act no different than other uninterpreted
-constants, but they can be manipualted with special constructors added by `#builtin`
+constants, but they can be manipulated with special relations added by `#builtin`
 declarations.
 
-A `#builtin` declarations change the lexer to represent certain identifiers as
-operations on built-in types. If you write
+A `#builtin` declarations connects a certain identifiers to a certain built-in
+relation. If you write
 
     #builtin INT_PLUS plus
     #builtin NAT_SUCC s
 
-then the identifiers `plus` and `s` will be parsed as a built-in definition instead
-of as a regular identifiers until those built-ins are redefined.
+then the identifiers `plus` and `s` will be treated, throughout the program, as a
+built-in definition instead of as a regular identifier.
 
 - The `NAT_ZERO` builtin takes no arguments and represents the natural number zero.
 - The `NAT_SUCC` builtin takes one natural number argument, and adds one to it. If
@@ -23,3 +23,7 @@ of as a regular identifiers until those built-ins are redefined.
 - The `INT_MINUS` builtin takes two integer arguments and returns an integer,
   subtracting the second from the first.
 - The `STRING_CONCAT` builtin takes two or more string arguments and concatenates them.
+
+## How built-in relations work
+
+All built-in relations
diff --git a/proto/busa.proto b/proto/busa.proto
@@ -40,7 +40,7 @@ message Rule {
     }
 
     /*
-    conclusion <Args> is <Values> :- prefix X0 ... XN
+    conclusion <Args> :- prefix X0 ... XN
 
      - Premise has arguments X0 ... XN in order
      - Conclusion has arbitrary patterns with vars X0...XN
@@ -54,27 +54,39 @@ message Rule {
     }
 
     /*
-    conclusion <Args> is { <Values> } :- prefix X0 ... XN (exhaustive)
-    conclusion <Args> is { <Values>? } :- prefix X0 ... XN (non-exhaustive)
+    conclusion <Args> is? { <Values> } :- prefix X0 ... XN
 
      - Premise has arguments X0 ... XN in order
      - Conclusion has arbitrary patterns with vars X0...XN
      - Conclusion is a fact with one value
     */
-    message ChoiceConclusion {
+    message OpenConclusion {
         string conclusion = 1;
         repeated Pattern args = 2;
         repeated Pattern choices = 3;
-        bool exhaustive = 4;
         string prefix = 5;
     }
 
     /*
-    conclusion Y3 X1 Z4 :- prefix X0 X1 Y2 Y3, fact X0 X1 Z2 is Z3 Z4.
+    conclusion <Args> is { <Values> } :- prefix X0 ... XN
 
-     - Prefix and fact premise share first N arguments X0...XN
-     - prefix can have additional arguments ...YM, M >= N
-     - fact can have additional arguments in args and values ...ZP, P >= N
+     - Premise has arguments X0 ... XN in order
+     - Conclusion has arbitrary patterns with vars X0...XN
+     - Conclusion is a fact with one value
+    */
+    message ClosedConclusion {
+        string conclusion = 1;
+        repeated Pattern args = 2;
+        repeated Pattern choices = 3;
+        string prefix = 5;
+    }
+
+    /*
+    conclusion Fact1 Shared1 Prefix1 FactValue :- prefix Shared0 Shared1 Prefix0 Prefix1, fact Shared0 Shared1 Fact0 Fact1 is FactValue.
+
+     - Prefix and fact premise share first N arguments Shared0...SharedN
+     - prefix can have additional arguments Prefix0...PrefixM
+     - fact can have additional arguments in args and values Fact0...FactP
      - conclusion is another prefix and has no repeat variables
     */
     message Join {
@@ -114,9 +126,12 @@ message Rule {
             INT_MINUS = 5;
             INT_TIMES = 6;
             STRING_CONCAT = 7;
-            EQUAL = 8;
-            GT = 9;
-            GEQ = 10;
+            CHECK_GT = 8;
+            CHECK_GEQ = 9;
+            CHECK_LT = 10;
+            CHECK_LEQ = 11;
+            EQUAL = 12;
+            NOT_EQUAL = 13;
         }
 
         string conclusion = 1;

diff --git a/src/client.ts b/src/client.ts
@@ -1,4 +1,4 @@
-import { Data, TRIV_DATA, getRef, hide } from './datastructures/data.js';
+import { Data, TRIVIAL, getRef, hide } from './datastructures/data.js';
 import {
   ChoiceTree,
   ChoiceTreeNode,
@@ -17,6 +17,7 @@ import {
 } from './engine/forwardengine.js';
 import { check } from './language/check.js';
 import { compile } from './language/compile.js';
+import { builtinModes } from './language/dusa-builtins.js';
 import { parse } from './language/dusa-parser.js';
 import { IndexedProgram } from './language/indexize.js';
 import { Issue } from './parsing/parser.js';
@@ -136,7 +137,7 @@ function* solutionGenerator(
 export class Dusa {
   private program: IndexedProgram;
   private debug: boolean;
-  private arities: Map<string, number>;
+  private arities: Map<string, { args: number; value: boolean }>;
   private db: Database;
   private stats: Stats;
   private cachedSolution: DusaSolution | null = null;
@@ -154,19 +155,19 @@ export class Dusa {
       throw new DusaError(parsed.errors);
     }
 
-    const { errors, arities } = check(parsed.document);
+    const { errors, arities, builtins } = check(builtinModes, parsed.document);
     if (errors.length !== 0) {
       throw new DusaError(errors);
     }
 
     this.debug = debug;
     this.arities = arities;
-    this.program = compile(parsed.document, debug);
+    this.program = compile(builtins, arities, parsed.document, debug);
     this.db = makeInitialDb(this.program);
     this.stats = { cycles: 0, deadEnds: 0 };
   }
 
-  private checkPredicateForm(pred: string, arity: number) {
+  private checkPredicateForm(pred: string, arity: { args: number; value: boolean }) {
     const expected = this.arities.get(pred);
     if (!pred.match(/^[a-z][A-Za-z0-9]*$/)) {
       throw new DusaError([
@@ -179,16 +180,24 @@ export class Dusa {
     }
     if (expected === undefined) {
       this.arities.set(pred, arity);
-    } else if (arity !== expected) {
+    } else if (arity.args !== expected.args) {
       throw new DusaError([
         {
           type: 'Issue',
           msg: `Predicate ${pred} should have ${expected} argument${
-            expected === 1 ? '' : 's'
+            expected.args === 1 ? '' : 's'
           }, but the asserted fact has ${arity}`,
           severity: 'error',
         },
       ]);
+    } else if (arity.value !== expected.value) {
+      throw new DusaError([
+        {
+          type: 'Issue',
+          msg: `Predicate ${pred} should ${expected.value ? '' : 'not '}have a value, but the asserted fact ${arity.value ? 'has' : 'does not have'} one.`,
+          severity: 'error',
+        },
+      ]);
     }
   }
 
@@ -200,11 +209,11 @@ export class Dusa {
     this.cachedSolution = null;
     this.db = { ...this.db };
     for (const { name, args, value } of facts) {
-      this.checkPredicateForm(name, args.length);
+      this.checkPredicateForm(name, { args: args?.length ?? 0, value: value !== undefined });
       insertFact(
         name,
-        args.map(termToData),
-        value === undefined ? TRIV_DATA : termToData(value),
+        args?.map(termToData) ?? [],
+        value === undefined ? TRIVIAL : termToData(value),
         this.db,
       );
     }
@@ -221,7 +230,7 @@ export class Dusa {
     this.db = { ...this.db };
 
     if (pred !== undefined) {
-      this.checkPredicateForm(pred, 2);
+      this.checkPredicateForm(pred, { args: 2, value: true });
     }
     const usedPred = pred ?? '->';
     const triples: [Data, Data, Data][] = [];

diff --git a/src/datastructures/data.test.ts b/src/datastructures/data.test.ts
@@ -3,7 +3,7 @@ import { DataView, dataToString, expose, hide } from './data.js';
 
 test('Internalizing basic types', () => {
   const testData: DataView[] = [
-    { type: 'triv' },
+    { type: 'trivial' },
     { type: 'int', value: 123n },
     { type: 'int', value: 0n },
     { type: 'string', value: 'abc' },
@@ -23,7 +23,7 @@ test('Internalizing basic types', () => {
     }
   }
 
-  expect(hide({ type: 'triv' })).toEqual(hide({ type: 'triv' }));
+  expect(hide({ type: 'trivial' })).toEqual(hide({ type: 'trivial' }));
 });
 
 test('Internalizing fibonacci-shaped structured types', () => {

diff --git a/src/datastructures/data.ts b/src/datastructures/data.ts
@@ -1,7 +1,7 @@
 export type Data = ViewsIndex | bigint;
 
 export type DataView =
-  | { type: 'triv' }
+  | { type: 'trivial' }
   | { type: 'int'; value: bigint }
   | { type: 'bool'; value: boolean }
   | { type: 'string'; value: string }
@@ -11,7 +11,7 @@ export type DataView =
 type ViewsIndex = number;
 let nextRef: number = -1;
 let views: DataView[] = [
-  { type: 'triv' },
+  { type: 'trivial' },
   { type: 'bool', value: true },
   { type: 'bool', value: false },
 ];
@@ -20,12 +20,12 @@ let structures: { [name: string]: DataTrie } = {};
 
 export function DANGER_RESET_DATA() {
   nextRef = -1;
-  views = [{ type: 'triv' }, { type: 'bool', value: true }, { type: 'bool', value: false }];
+  views = [{ type: 'trivial' }, { type: 'bool', value: true }, { type: 'bool', value: false }];
   strings = {};
   structures = {};
 }
 
-export const TRIV_DATA = 0;
+export const TRIVIAL = 0;
 export const BOOL_TRUE = 1;
 export const BOOL_FALSE = 2;
 
@@ -79,7 +79,7 @@ function setStructureIndex(name: string, args: Data[], value: ViewsIndex) {
 
 export function hide(d: DataView): Data {
   switch (d.type) {
-    case 'triv':
+    case 'trivial':
       return 0;
     case 'int':
       return d.value;
@@ -113,26 +113,27 @@ export function compareData(a: Data, b: Data): number {
   const x = expose(a);
   const y = expose(b);
   switch (x.type) {
-    case 'triv':
-      if (y.type === 'triv') return 0;
+    case 'trivial':
+      if (y.type === 'trivial') return 0;
       return -1;
     case 'int':
-      if (y.type === 'triv') return 1;
+      if (y.type === 'trivial') return 1;
       if (y.type === 'int') {
         const c = x.value - y.value;
         return c > 0n ? 1 : c < 0n ? -1 : 0;
       }
       return -1;
     case 'bool':
-      if (y.type === 'triv' || y.type === 'int') return 1;
+      if (y.type === 'trivial' || y.type === 'int') return 1;
       if (y.type === 'bool') return (x.value ? 1 : 0) - (y.value ? 1 : 0);
       return -1;
     case 'ref':
-      if (y.type === 'triv' || y.type === 'int' || y.type === 'bool') return 1;
+      if (y.type === 'trivial' || y.type === 'int' || y.type === 'bool') return 1;
       if (y.type === 'ref') return x.value - y.value;
       return -1;
     case 'string':
-      if (y.type === 'triv' || y.type === 'int' || y.type === 'bool' || y.type === 'ref') return 1;
+      if (y.type === 'trivial' || y.type === 'int' || y.type === 'bool' || y.type === 'ref')
+        return 1;
       if (y.type === 'string') return x.value > y.value ? 1 : x.value < y.value ? -1 : 0;
       return -1;
     case 'const':
@@ -181,7 +182,7 @@ export function escapeString(input: string): string {
 export function dataToString(d: Data, needsParens = true): string {
   const view = expose(d);
   switch (view.type) {
-    case 'triv':
+    case 'trivial':
       return `()`;
     case 'int':
       return `${view.value}`;