Skip to content
This repository has been archived by the owner on Mar 31, 2019. It is now read-only.

Type system

Jim Pivarski edited this page Dec 8, 2016 · 7 revisions

Femtocode type system

Femtocode has a highly granular type system: enough is known at compile-time to ensure the non-existence of runtime errors. This means, for instance, knowing that all possible values of variable x are not equal to zero so that x may appear in the denominator of a division.

This type system applies to dataset fields, which are explicitly specified as part of the dataset description, and also to intermediate Femtocode variables, which are inferred from the dataset fields. The types of intermediate Femtocode variables may be very complicated, but the user rarely needs to know them.

In the Python implementation of Femtocode, types are called "schemas" to avoid a clash with Python's built-in "type" function. In fact, all Femtocode types are named to avoid clashes with Python built-ins. Since Femtocode is quoted within Python, Femtocode types are Python values and Femtocode kinds (types of types) are Python types.

Femtocode types are specified by naming one of eight broad kinds of types. Some of these can be parameterized, some parameters are optional and others are required, and some parameters accept types, to make a tree of nested types. All types except unions can be labeled with a string alias, which can be used in place of the type (as a string "t" in Python or a {"alias": "t"} in JSON). Aliases can be used to simplify type expressions, but they can also be used to construct recursive types (such as trees). If an alias is multiply defined, it is an error to make subsequent definitions differ.

Types may be thought of as sets; the set of all values that a variable might have at runtime. They have well defined (and implemented!) set operations: union, intersection, and difference.

Impossible

Python JSON
impossible "impossible", {"type": "impossible"}
impossible("t") {"type": "impossible", "alias": "t"}

Attributes:

  • alias: name for reuse

The first kind is actually a placeholder for errors. Impossible is the "bottom type," the type which cannot be satisfied by any runtime values. In most languages, the bottom type is used to represent the return value of exceptions or infinite loops, but since Femtocode doesn't have these, impossible is used to signify a type error requiring user action.

I have no idea why you might want to make instances of this type. However, it is essential to the type-inference logic.

Null

Python JSON
null "null", {"type": "null"}
null("t") {"type": "null", "alias": "t"}

Attributes:

  • alias: name for reuse

The second kind also has marginal value: the null type has exactly one value at runtime: None. It is most useful when used in a union, such as union(null, integer) to indicate integers that might have a value or might be missing.

Boolean

Python JSON
boolean "boolean", {"type": "boolean"}
boolean("t") {"type": "boolean", "alias": "t"}

Attributes:

  • alias: name for reuse

The boolean type has two values at runtime: True and False. It is used in logical predicates and is completely distinct from integers. (That is, 0 is not the same thing as False, and if 0 is a type error.)

Number

Python JSON
integer "integer", {"type": "integer"}
integer(0, 10), integer(min=0, max=10) {"type": "integer", "min": 0, "max": 10}
integer(0, almost(inf)) {"type": "integer", "min": 0, "max": {"almost": "inf"}}
real "real", {"type": "real"}
real(0, almost(10)) {"type": "integer", "min": 0, "max": {"almost": 10}}
extended "extended", {"type": "extended"}
extended(0, inf) "extended", {"type": "extended", "min": 0, "max": "inf"}
integer(alias="t") {"type": "integer", "alias": "t"}

Attributes:

  • min: the minimum possible value, may be a number or almost(number)
  • max: the maximum possible value, may be a number or almost(number)
  • whole: True for integers, False for floating point
  • alias: name for reuse

The Number kind describes numeric intervals. It has three concrete types:

  • integer, which has min=almost(-inf), max=almost(inf), whole=True
  • real, which has min=almost(-inf), max=almost(inf), whole=False
  • extended for the "extended reals", which has min=-inf, max=inf, whole=False

Each of these concrete types can be used as a starting point for parameterizing a numeric interval: min and max are both optional. If either of these endpoints is a number, it is a closed endpoint; if it is almost(number), it is an open endpoint. Endpoints like almost(inf) include all finite numbers; inf includes infinity as well.

Details of byte-representation, such as the number of bits to use, signed versus unsigned, little-endian versus big-endian, are all the responsibility of the backend, though the numeric range provides hints for setting these parameters. The extended reals is well-described by IEEE floating point numbers, which include bit-patterns for negative and positive infinity. This representation also has NAN or "not a number," but Femtocode does not allow this value. For the equivalent, consider union(null, extended), which could raise compile-time errors if a missing value is passed into a function that requires no missing values.

String

Python JSON
string "string", {"type": "string"}
string("unicode") {"type": "string", "charset": "unicode"}
string(fewest=5) {"type": "string", "fewest": 5}
string(most=10) {"type": "string", "most": 10}
string(alias="t") {"type": "string", "alias": "t"}

Attributes:

  • charset: either "bytes" or "unicode", which are mutually exclusive; default is "bytes"
  • fewest: minimum number of characters (individual bytes or unicode codepoints, depending on charset); default is 0
  • most: maximum number of characters; default is almost(inf)
  • alias: name for reuse

The String kind specifies raw byte arrays or Unicode strings. Note that intersection(string("bytes"), string("unicode")) is impossible (no commonality whatsoever). Fewest and most specify what string lengths are allowed, which can be important for functions that expect a string to be non-empty. Note that if fewest equals most and the charset is "bytes", then each string takes an equal amount of memory. If not, the lengths will need to be stored with the strings.

Collection

Python JSON
collection(integer) {"type": "collection", "items": "integer"}
collection(integer, fewest=5) {"type": "collection", "items": "integer", "fewest": 5}
collection(integer, most=10) {"type": "collection", "items": "integer", "most": 10}
collection(integer, ordered=True) {"type": "collection", "items": "integer", "ordered": true}
vector(integer, 3) {"type": "vector", "items": "integer", "dimensions": [3]}
matrix(integer, 2, 3) {"type": "matrix", "items": "integer", "dimensions": [2, 3]}
tensor(integer, 1, 2, 3) {"type": "tensor", "items": "integer", "dimensions": [1, 2, 3]}
collection(integer, alias="t") {"type": "collection", "items": "integer", "alias": "t"}

Attributes:

  • items: concrete type of the contained items; (required)
  • fewest: minimum number of items; default is 0
  • most: maximum number of items; default is almost(inf)
  • ordered: True if the collection has a natural ordering, False otherwise
  • alias: name for reuse

The Collection kind has no unparameterized types. You must specify an items type, which represents nested data. In the above examples, the Collections all contained integers, but this could be any type. Fewest and most work the same way as for Strings.

There are three special constructors,

  • vector for ordered collections of fixed size
  • matrix for ordered collections of ordered collections of fixed size
  • tensor for arbitrarily deep nestings of ordered collections

For instance, tensor(integer, 1, 2, 3) is shorthand for

collection(collection(collection(integer, 3, 3, True), 2, 2, True), 1, 1, True)

Record

Python JSON
record(one=integer, two=real, three=string) {"type": "record", "fields": {"one": "integer", "two": "real", "three": "string"}}
record("tree", left="tree", right="tree") {"type": "record", "fields": {"left": {"alias": "tree"}, "right": {"alias": "tree"}}}

Attributes:

  • alias: name for reuse; must be the first argument, if given
  • fields: mapping from field names to concrete types, which are keyword arguments in Python; (required)

Records are structures that have a fixed set of named, typed fields. (Records are often called "product types" because they represent the cartesian product of their field types.) Aliases can be used to create records that contain themselves, such as trees.

The example above was simplified so that the JSON would be readable, but if a tree type directly contained the same tree type, the only value it could contain would be an infinite tree. A more realistic example is

record("tree", left=union(null, "tree"), right=union(null, "tree"))

which allows trees to have leaves that are null.

Union

Python JSON
union(null, string) {"type": "union", "possibilities": ["null", "string"]}
union(real(max=almost(0)), real(min=almost(0))) {"type": "union", "possibilities": [{"type": "real", "max": {"almost": 0}}, {"type": "real", "min": {"almost": 0}}]}

Attributes:

  • possibilities: list of possible types, which are given inline in Python; (required)

A variable has union type if its value can be one (and only one) of the possibilities. (Unions are often called "sum types" because they can be represented by a sum in the algebra in which records are the products.) Union types cannot be aliased.

The union constructor (in both Python and JSON) do not necessarily create an object of union type. If numerical intervals can be combined, they will be, and the result will be a single numerical interval. Also, unions cannot be nested, as the nesting level would simply be flattened anyway. The union constructor is actually a set-operation function, similar to intersection and difference (for which there are no JSON equivalents). The example above represents the real number line excluding zero. If either endpoint had been 0 (closed) rather than almost(0) (open), the result would simply be real.

Unions can also be used to specify class inheritance, in the object-oriented sense. An abstract class would be represented as the union of all of its concrete subclasses. However, unions are more general than class inheritance.

The backend representation of a union must be "tagged": both the type and value must be specified. The tag can typically be a small integer, or even the high bits of an 64-bit index. If null is included among the possibilities, data need not be allocated for its None values.

Clone this wiki locally