Proposal: expand @odk/xpath function implementation to handle all aspects of expressions #17
Labels
deferred
Not actively planned, but worth keeping open for discussion and future consideration
@odk/xpath
Motivation
Despite overwhelming similarities and copious documented references between the XPath and ODK XForms specifications (as well as ODK XForms both in real world use and in existing test suites), there are crucial semantic differences in how certain node-set references are expected to be resolved. These semantic differences depend on structural information about an XForm which is out of band from its ultimate evaluation context.
There may be other cases, but the most prominent case which comes to mind is how dependencies are resolved in expressions from repeats and their descendants. Take this example, an abbridged version of a test fixture ported from JavaRosa:
Strictly adhering to XPath semantics, the calculation of
position_2
will always produce2
. This is because the reference to the absolute path/data/repeat/position
will resolve the first matching node, in document order.The expectation in ODK XForms is that the same path will resolve to the node nearest its context node. In a second repeat instance, the path reference should resolve to the
position
node in the samerepeat
parent. More concretely:The closest semantic equivalent in XPath, per spec, would be as if the expression were relative:
A tempting solution is to rewrite such expressions in exactly this way, and then evaluate them according to standard XPath semantics. I've explored this, and it is promising. But I have a couple of concerns with this approach:
Having mulled this for the last few weeks, I think a more appropriate way to address this semantic discrepancy is to build on the same mechanisms already used to address other XPath/XForms semantic discrepancies: function implementations, namespacing, and shadowing/overrides.
I believe the function implementation interface is, or is very nearly, well suited to handle this particular issue and its many permutations we'll likely encounter as we fill out support for repeats. I also believe this will have tremendous potential benefits for performance optimization (more on that below).
Proposal summary
In brief, this is a proposal to:
I'll address the last of these first, then the others in order.
Improved function implementation APIs
The existing
FunctionImplementation
API (and its derivatives) has proven successful enough to support a full reimplementation of the current Enketo evaluator's scope. But it has a few really glaring weaknesses:Type safety around arity: every
FunctionImplementation
's host implementation (as in the function which is actually called in the JavaScript runtime) is typed as variadic, with zero required arguments. The vast majority use the unsafe!
(non-null assertion) operator extensively to work around this limitation.Static and runtime handling of parameter value types: currently, parameter type information is unused (even if it is present). Parameters which should always be (or be cast to) a particular type must be manually cast in each host function. More problematic, functions operating on nodes rely on manual runtime type validation in each host function as well. The latter is a particular footgun for this proposal.
Parameter ergonomics: parameters are always evaluated lazily—which is necessary in a few cases (e.g.
if(false(), explode(), kittens())
must return kittens without exploding them or anything else), and potentially good for performance in a few others (e.g. short-circuiting onNaN
values)—but mostly it's just a lot of unnecessary ceremony.Return types: the base
FunctionImplementation
class has no support for specifying a return type at all, and certainly no means of enforcing it. Its typed subclasses are somewhat an improvement. This is currently less pressing an issue, but it's one I'd like to bring along for the ride. And it's a potential opportunity for other improvements, like automatic documentation generation, and possibly adding certain runtime checks e.g. to ensure a given function implementation valid to substitute/supplement another.Ambiguity around use of context: every host function currently takes a
context
parameter, and must use it to resolve each other parameter due to the aforementioned pervasive lazy evaluation. This makes it unclear when and how functions actually operate explicitly on their calling context.Alignment between signature definition and actual host function signature: there really isn't any, largely due to the other issues discussed above.
I believe all of these can be addressed with judicious use of a few complex TypeScript types, and careful consideration in particular around how/whether to keep evaluation lazy. Examples in the sections below, while pseudocode, will be at least partly based on prototyping around these API improvements.
Migrate internal non-function syntax to function implementations
Each non-function aspect of an XPath expression could be expressed as a function call. For instance, this expression:
… could be expressed something like this psuedo-code:
I chose to make this example lisp-y, because it can concisely express concepts like namespacing and special keywords, and because the intent is to show the idea of syntax-as-functions in the abstraact without overly tying it to how that would be implemented here.
I also took some liberties to simplify some aspects of the example. Notably absent is the concept of "context", which should be a parameter to several of the function calls. Some of the functions as expressed here would also have a more complex signature than the present
FunctionImplementation
API can express (or would with the improvements discussed above, though I'm open to accommodating that if we prefer).But for the purposes of this proposal, implementations for the functions involved might look something like (again, this is pseudo-code):
Provide interfaces for function implementations to conditionally override, supplement, and defer to, their internal/default counterparts.
This section is going to be somewhat handwavy. I've thought about the API aspects of this least of all. The general idea is that a consumer of the primary
Evaluator
can provide specialized syntax/function implementations suitable for specific conditions, and fall back to standard behavior when those specific conditions are not met.I think the best way to illustrate this is to come right back to the motivation for this proposal. Assuming we have:
position_2
/data/repeat[2]/position_2
/data/repeat/position * 2
We can provide a function overriding
absolute_path
something like:Fallback?
Calling
fallback()
here is doing a lot of work to carry this example (and again, it's pseudo-code to illustrate the idea, not an actual API proposal), without much explanation of how it would work. Basically it's kinda sorta like a continuation, where it instructs the evaluator to try the next candidate syntax function capable of handling the same syntax node for the given syntax node in the given context.To make explicit something that's only been implied up to this point: currently (as of #13), functions are looked up by name (local, optionally namespaced) in each of the
FunctionLibrary
s available to the currentEvaluator
context. Once a matching function is found, it's called, and that's that. This proposal would depend on potentially finding multiple implementations:... calling each in turn until it does not "fall back"—essentially until one of the candidate functions returns a value satisfying its
returns
annotation.(All matching implementations "falling back" would be considered an error.)
Other cases will be more complicated, of course. For XForms semantics, we'll need to handle at least path references with predicates between steps. I believe elements of the above provide some direction for how this might be handled within the scope of this proposal. I'm hesitant to go too much deeper into examples for this use case without prototyping further, with more team visibility and discussion.
But to fill out some other ideas alluded above…
Performance opportunity: named child steps
Name tests, as handled internally in
@odk/xpath
, are extraordinarily general in order to satisfy some edge cases around the XPath and XML namespace specs that we don't really need to accommodate in most (if not all) XForms usage. An earlier example included an "oversimplification" of such a name test, but with this proposal that simplification can be employed only in conditions we know it safe.Suppose we take an upfront step to ensure the XForms namespace is used as the default namespace, and can ignore any more complicated namespace testing for unqualified names within a form entry's primary instance:
Performance opportunity: arbitrary static subtrees
Itemsets populated by
instance()/...
expressions are by far the biggest performance issue in the web-forms work so far. This is largely because the same expressions and sub-expressions are evaluated repeatedly, redundantly, without any notion of their static-ness. An example also feels redundant at this point, it would be a slight variation on the previous two. In any case, I believe that a similar lookup/fallback approach for secondary instance data would be a significant improvement to performance in this case… particularly combined with application of the approach for primary instance references in those same itemset predicates.The text was updated successfully, but these errors were encountered: