You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to define a data structure specification for a query that can become canonical within tech.ml.dataset. This will help make query-related functions smarter because it will be introspectable.
A concrete example of a function that can use this query definition is tech.v3.dataset.base/filter-column, whose signature is currently (dataset colname predicate) -> dataset. predicate can be a value or an instance of IFn. If filter-column were to take a query specification instead of a function, it could decide how to execute the filter, choosing the most optimal path that is appropriate for the data -- for example, choosing to use binary search for ordered data or the new column index-structure for unordered data. This the behavior we want to unlock with this change.
@cnuernber laid out a draft of what this might look like in a PR (see here). In it there are two simple query types: :any-of and :range. The filter data structures are maps that include a special key :filter-type and then other keys as needed based on the type of filter.
It also bears mentioning that this data structure resonates a bit with the signature of the tech.v3.dataset.column-index-structure.select-from-index function that can be used to query a column's index structure. That function takes a mode that at the moment is either :slice or :pick and then a hash map of key-value pairs specifying the query based on the mode (see here). Whatever data structure we end up creating, it could be that that we change select-from-index to take that query structure. This would be a case of this query data structure becoming universal among TMD functions.
Another thing to keep in mind, is that @ribelo and @genmeblog are working on "lifting" tech.ml.dataset column functions into tablecloth in this PR that may be something to consider. I'm not sure yet if the work that is being done there could influence how we define the query data structure here, or vice versa.
The text was updated successfully, but these errors were encountered:
Closing - I do think we could use datastructure-based queries for things like filter-column and filter and such but no one is moving that direction and I would rather see users come up with their own pathways and then take some of those pathways and move them up the chain.
We want to define a data structure specification for a query that can become canonical within tech.ml.dataset. This will help make query-related functions smarter because it will be introspectable.
A concrete example of a function that can use this query definition is
tech.v3.dataset.base/filter-column
, whose signature is currently(dataset colname predicate) -> dataset
.predicate
can be a value or an instance of IFn. Iffilter-column
were to take a query specification instead of a function, it could decide how to execute the filter, choosing the most optimal path that is appropriate for the data -- for example, choosing to use binary search for ordered data or the new column index-structure for unordered data. This the behavior we want to unlock with this change.@cnuernber laid out a draft of what this might look like in a PR (see here). In it there are two simple query types:
:any-of
and:range
. The filter data structures are maps that include a special key:filter-type
and then other keys as needed based on the type of filter.It also bears mentioning that this data structure resonates a bit with the signature of the
tech.v3.dataset.column-index-structure.select-from-index
function that can be used to query a column's index structure. That function takes amode
that at the moment is either:slice
or:pick
and then a hash map of key-value pairs specifying the query based on the mode (see here). Whatever data structure we end up creating, it could be that that we changeselect-from-index
to take that query structure. This would be a case of this query data structure becoming universal among TMD functions.Another thing to keep in mind, is that @ribelo and @genmeblog are working on "lifting" tech.ml.dataset column functions into tablecloth in this PR that may be something to consider. I'm not sure yet if the work that is being done there could influence how we define the query data structure here, or vice versa.
The text was updated successfully, but these errors were encountered: