-
Notifications
You must be signed in to change notification settings - Fork 0
Type system
Femtocode has a highly granular type system: enough is known at compile-time to ensure the non-existence of runtime errors. This means, for instance, knowing that all possible values of variable x
are not equal to zero so that x
may appear in the denominator of a division.
This type system applies to dataset fields, which are explicitly specified as part of the dataset description, and also to intermediate Femtocode variables, which are inferred from the dataset fields. The types of intermediate Femtocode variables may be very complicated, but the user rarely needs to know them.
In the Python implementation of Femtocode, types are called "schemas" to avoid a clash with Python's built-in "type" function. In fact, all Femtocode types are named to avoid clashes with Python built-ins. Since Femtocode is quoted within Python, Femtocode types are Python values and Femtocode kinds (types of types) are Python types.
Femtocode types are specified by naming one of eight broad kinds of types. Some of these can be parameterized, some parameters are optional and others are required, and some parameters accept types, to make a tree of nested types. All types except unions can be labeled with a string alias, which can be used in place of the type (as a string "t"
in Python or a {"alias": "t"}
in JSON). Aliases can be used to simplify type expressions, but they can also be used to construct recursive types (such as trees). If an alias is multiply defined, it is an error to make subsequent definitions differ.
Types may be thought of as sets; the set of all values that a variable might have at runtime. They have well defined (and implemented!) set operations: union
, intersection
, and difference
.
Python | JSON |
---|---|
impossible |
"impossible" , {"type": "impossible"}
|
impossible("t") |
{"type": "impossible", "alias": "t"} |
Attributes:
- alias: name for reuse
The first kind is actually a placeholder for errors. Impossible is the "bottom type," the type which cannot be satisfied by any runtime values. In most languages, the bottom type is used to represent the return value of exceptions or infinite loops, but since Femtocode doesn't have these, impossible is used to signify a type error requiring user action.
I have no idea why you might want to make instances of this type. However, it is essential to the type-inference logic.
Python | JSON |
---|---|
null |
"null" , {"type": "null"}
|
null("t") |
{"type": "null", "alias": "t"} |
Attributes:
- alias: name for reuse
The second kind also has marginal value: the null type has exactly one value at runtime: None
. It is most useful when used in a union, such as union(null, integer)
to indicate integers that might have a value or might be missing.
Python | JSON |
---|---|
boolean |
"boolean" , {"type": "boolean"}
|
boolean("t") |
{"type": "boolean", "alias": "t"} |
Attributes:
- alias: name for reuse
The boolean type has two values at runtime: True
and False
. It is used in logical predicates and is completely distinct from integers. (That is, 0
is not the same thing as False
, and if 0
is a type error.)
Python | JSON |
---|---|
integer |
"integer" , {"type": "integer"}
|
integer(0, 10) , integer(min=0, max=10)
|
{"type": "integer", "min": 0, "max": 10} |
integer(0, almost(inf)) |
{"type": "integer", "min": 0, "max": {"almost": "inf"}} |
real |
"real" , {"type": "real"}
|
real(0, almost(10)) |
{"type": "integer", "min": 0, "max": {"almost": 10}} |
extended |
"extended" , {"type": "extended"}
|
extended(0, inf) |
"extended" , {"type": "extended", "min": 0, "max": "inf"}
|
integer(alias="t") |
{"type": "integer", "alias": "t"} |
Attributes:
- min: the minimum possible value, may be a number or almost(number)
- max: the maximum possible value, may be a number or almost(number)
- whole:
True
for integers,False
for floating point - alias: name for reuse
The Number kind describes numeric intervals. It has three concrete types:
-
integer
, which hasmin=almost(-inf), max=almost(inf), whole=True
-
real
, which hasmin=almost(-inf), max=almost(inf), whole=False
-
extended
for the "extended reals", which hasmin=-inf, max=inf, whole=False
Each of these concrete types can be used as a starting point for parameterizing a numeric interval: min
and max
are both optional. If either of these endpoints is a number, it is a closed endpoint; if it is almost(number), it is an open endpoint. Endpoints like almost(inf) include all finite numbers; inf includes infinity as well.
Details of byte-representation, such as the number of bits to use, signed versus unsigned, little-endian versus big-endian, are all the responsibility of the backend, though the numeric range provides hints for setting these parameters. The extended reals is well-described by IEEE floating point numbers, which include bit-patterns for negative and positive infinity. This representation also has NAN or "not a number," but Femtocode does not allow this value. For the equivalent, consider union(null, extended)
, which could raise compile-time errors if a missing value is passed into a function that requires no missing values.
Python | JSON |
---|---|
string |
"string" , {"type": "string"}
|
string("unicode") |
{"type": "string", "charset": "unicode"} |
string(fewest=5) |
{"type": "string", "fewest": 5} |
string(most=10) |
{"type": "string", "most": 10} |
string(alias="t") |
{"type": "string", "alias": "t"} |
Attributes:
- charset: either
"bytes"
or"unicode"
, which are mutually exclusive; default is"bytes"
- fewest: minimum number of characters (individual bytes or unicode codepoints, depending on charset); default is
0
- most: maximum number of characters; default is
almost(inf)
- alias: name for reuse
The String kind specifies raw byte arrays or Unicode strings. Note that intersection(string("bytes"), string("unicode"))
is impossible
(no commonality whatsoever). Fewest and most specify what string lengths are allowed, which can be important for functions that expect a string to be non-empty. Note that if fewest equals most and the charset is "bytes"
, then each string takes an equal amount of memory. If not, the lengths will need to be stored with the strings.
Python | JSON |
---|---|
collection(integer) |
{"type": "collection", "items": "integer"} |
collection(integer, fewest=5) |
{"type": "collection", "items": "integer", "fewest": 5} |
collection(integer, most=10) |
{"type": "collection", "items": "integer", "most": 10} |
collection(integer, ordered=True) |
{"type": "collection", "items": "integer", "ordered": true} |
vector(integer, 3) |
{"type": "vector", "items": "integer", "dimensions": [3]} |
matrix(integer, 2, 3) |
{"type": "matrix", "items": "integer", "dimensions": [2, 3]} |
tensor(integer, 1, 2, 3) |
{"type": "tensor", "items": "integer", "dimensions": [1, 2, 3]} |
collection(integer, alias="t") |
{"type": "collection", "items": "integer", "alias": "t"} |
Attributes:
- items: concrete type of the contained items; (required)
- fewest: minimum number of items; default is
0
- most: maximum number of items; default is
almost(inf)
- ordered: True if the collection has a natural ordering, False otherwise
- alias: name for reuse
The Collection kind has no unparameterized types. You must specify an items type, which represents nested data. In the above examples, the Collections all contained integers, but this could be any type. Fewest and most work the same way as for Strings.
There are three special constructors,
-
vector
for ordered collections of fixed size -
matrix
for ordered collections of ordered collections of fixed size -
tensor
for arbitrarily deep nestings of ordered collections
For instance, tensor(integer, 1, 2, 3)
is shorthand for
collection(collection(collection(integer, 3, 3, True), 2, 2, True), 1, 1, True)
Python | JSON |
---|---|
record(one=integer, two=real, three=string) |
{"type": "record", "fields": {"one": "integer", "two": "real", "three": "string"}} |
record("tree", left="tree", right="tree") |
{"type": "record", "fields": {"left": {"alias": "tree"}, "right": {"alias": "tree"}}} |
Attributes:
- alias: name for reuse; must be the first argument, if given
- fields: mapping from field names to concrete types, which are keyword arguments in Python; (required)
Records are structures that have a fixed set of named, typed fields. (Records are often called "product types" because they represent the cartesian product of their field types.) Aliases can be used to create records that contain themselves, such as trees.
The example above was simplified so that the JSON would be readable, but if a tree type directly contained the same tree type, the only value it could contain would be an infinite tree. A more realistic example is
record("tree", left=union(null, "tree"), right=union(null, "tree"))
which allows trees to have leaves that are null
.
Python | JSON |
---|---|
union(null, string) |
{"type": "union", "possibilities": ["null", "string"]} |
union(real(max=almost(0)), real(min=almost(0))) |
{"type": "union", "possibilities": [{"type": "real", "max": {"almost": 0}}, {"type": "real", "min": {"almost": 0}}]} |
Attributes:
- possibilities: list of possible types, which are given inline in Python; (required)
A variable has union type if its value can be one (and only one) of the possibilities. (Unions are often called "sum types" because they can be represented by a sum in the algebra in which records are the products.) Union types cannot be aliased.
The union
constructor (in both Python and JSON) do not necessarily create an object of union type. If numerical intervals can be combined, they will be, and the result will be a single numerical interval. Also, unions cannot be nested, as the nesting level would simply be flattened anyway. The union
constructor is actually a set-operation function, similar to intersection
and difference
(for which there are no JSON equivalents). The example above represents the real number line excluding zero. If either endpoint had been 0
(closed) rather than almost(0)
(open), the result would simply be real
.
Unions can also be used to specify class inheritance, in the object-oriented sense. An abstract class would be represented as the union of all of its concrete subclasses. However, unions are more general than class inheritance.
The backend representation of a union must be "tagged": both the type and value must be specified. The tag can typically be a small integer, or even the high bits of an 64-bit index. If null is included among the possibilities, data need not be allocated for its None
values.