Skip to content

frequently asked questions

Ryan Wisnesky edited this page Aug 6, 2019 · 10 revisions

Transitive Closure

Transitive closure requires chasing with the transitivity implication. Reflexive, symmetric, transitive closures can be down with Sigma (the Simpsons Likes example).

Type inference cannot assign the same string to two sorts

So e.name = e is not allowed even if e:Employee and name:Employee->String and e:String can be expressed. To work around, write e.name = e@String.

Changing the Memory Limit

To run the CQL IDE with more/less than the default 64mb heap, you must use a command line option such as:

java -Xms512m -Xmx2048m -jar CQL.jar

Adding JDBC drivers to the classpath

To run the CQL IDE with a particular JDBC driver requires placing the driver on the class path using a -cp command line option and running java directly, for example:

java -cp "./mysql-connector-java-5.1.27-bin.jar:./CQL.jar" catdata.ide.IDE

On some systems, especially windows systems, the colon in the classpath may need to be replaced with a semi-colon.

Using the E prover

The E automated theorem prover can be used with CQL. Once installed, set the e_path option to the e executable, for example e_path = "/usr/local/bin/eprover" and set the prover option to e. Using e is required for the check_query command, and sometimes e will succeed where CQL's built-in prover will fail. In general, CQL will be faster for those theories that both it and CQL can decide.

HTML markup

The CQL IDE allows HTML markup to be inserted into CQL files. After a program has compiled, the emit as HTML option in the CQL menu will emit the HTML code. The HTML will contain pretty-printings of the program text, HTML tables for instances, and (possibly) javascript graphs for schemas. To insert HTML, use the keyword html, then brackets, then (*, and then a quoted string (note that because the HTML appears in a string, any quotes and backslashes within that string must be escaped). For example,

html (* "<html>Hello world <a href=\"main.html\">link</a></html>" *)

The CQL tutorial, built-in as the Tutorial example, illustrates HTML output. Alternatively, markdown (a simplified way to write HTML) can be used; use the keyword md instead of html. Note that CQL emits HTML tables that, when clicked, sort themselves by invoking javascript code from

https://categoricaldata.net/js/simple.js

and a CSS style file is at

https://categoricaldata.net/css/simple.js

Just in time compilation

The CQL IDE theorem provers benefit greatly from native compilation, which the JVM performs lazily. Experiments suggest several CQL theorem provers become 2-4x faster after JIT compilation. For this reason, if performance is critical it is suggested to run a few warm-up built-in examples to trigger JIT compilation.

Caching

The CQL IDE will cache result artifacts to save time in subsequent executions; this behavior can be disabled per-expression through the always_reload option. Disabling caching is can be useful when an CQL program contains side-effecting commands.

CQL's Typing Discipline

Officially, CQL programs are nominally typed. For example, if schema X = Y is a declaration, then id X has type X -> X which is not equal to Y -> Y (beause X and Y are different names). Unofficially, the CQL type-checker accepts many programs that are not nominally typed but are still well-typed.

Scripting with Side Effects using commands

All CQL expressions with the exception of commands are side-effect free. Many commands have side effects; for example, export_instance_jdbc can write data to a database via JDBC. Therefore, the execution order for commands matters. To establish an ordering, each command implicitly depends on all declarations before it. Consequently, commands can bottleneck CQL execution (the CQL execution engine is multi-threaded). commands are provided to execute arbitrary SQL (via JDBC), Java (via Nashorn javascript), and command-line commands, thereby allowing CQL programs to script an entire data integration / ETL flow.

Categorical Databases are Deductive Databases

CQL databases are deductive: that is, they are defined only up to provable equality. This means, for example, that an instance may contain a row whose value is 2+2 rather than 4 . From CQL's perspective, 2+2 and 4 are freely interchangeable. However, when exporting CQL databases to CSV or SQL, there is a difference between 2+2 and 4 : the former is an expression, and the latter is a constant, and they can be exported differently. To establish a true normal form (i.e., to everywhere prefer 4 over 2+2) it is necessary to use java type sides.

Java (script) typesides and inline functions

CQL typesides can contain bindings of types to java classes and of function symbols to javascript code. (In this manual, we use java and javascript interchangeably, because the Java VM's Nashorn execution engine for javascript allows the use of java functions.) The CQL execution engine will invoke this javascript code during execution, for example, to reduce 2+2 to 4 . In order to maintain completeness of CQL's theorem prover, certain equations involving java are disallowed. This restriction can be overridden using the allow_java_eqs_unsafe option. Java functions need to be side-effect free. For SQL interaction, CQL provides a special built-in typeside called sql.

Consistency (Conservativity)

It is possible for an CQL instance to be inconsistent; i.e., for 1 = 2 to be provable. If this behavior is not desired, consistency can be enforced using the require_consistency option. This option is overly conservative: it may disallow instances that are consistent.

Labelled Nulls

Nulls in CQL are labelled; i.e., two null values need not be equal. In addition, there is no way to test if a value is a labelled null. Non-labelled nulls can be defined using optional types in typesides, see the Outer join example for details.

Schema, Mapping, Query, Instance, Transform Inference

The CQL IDE provides code-assistance functions. For example, if S and T are schemas, then if literal : S -> T is highlighted, the CQL editor is right-clicked, and infer mapping is selected in the resulting pop-up menu, then the IDE will add the skeleton of a mapping declaration. In particular, it will list all of the entities, attributes, and foreign keys for S and will describe how they can be mapped into T ; and similarly for queries, transforms, etc. In addition, by pressing control-shift-space after a kind name (e.g., typeside), the CQL editor will pop up a template containing the sections for a corresponding literal.

Automated Theorem Proving

The CQL engine uses automated theorem proving technology to ensure that CQL programs cannot fail at runtime (modulo javascript errors), or materialize instances that do not satisfy their data integrity constraints. To provide this guarantee, every CQL typeside, schema, and instance must be a decidable equational theory. Not all equational theories are decidable; moreover, automated theorem proving methods can be slow or incomplete. The CQL engine provides a number of theorem provers which by default it uses automatically as appropriate; however, specific theorem provers can be chosen by using the prover option. In addition, schema mappings and transforms and queries are required to preserve provable equality; this behavior can be disabled with the dont_validate_unsafe option. To find out which prover is being used, click the info button in the DP pane in the viewer. The DP pane also allows users to decide the equality of arbitrary terms by entering them as text (useful for debugging CQL programs).

Provenance / Lineage of generated IDs

The CQL engine generates fresh IDs during execution. These IDs are exported via JDBC and CSV and are used by the engine internally. The CQL engine maintains a lineage for these IDs that is displayed in the CQL viewer. This lineage is invariably more meaningful to humans than the generated IDs. For example, rather than display fresh ID 847, the viewer might display bill.manager. It is important to remember that these lineages are not canonical, are provided for convenience only, and the real data is the generated IDs, not the lineages.