-
Notifications
You must be signed in to change notification settings - Fork 330
LINQ
With the cell
selectors,
we can select all cells of a specific type in the local memory
storage, wrapped in an IEnumerable<CellType>
or
IEnumerable<CellType_Accessor>
. This interface exposes basic
enumeration capabilities. By itself, an
IEnumerable<T>
is nothing more than a container where it can pump out elements one
after another -- similar to making iterations through the whole
database with cursors in other databases. It does not provide indexer
so we cannot take an element by specifying a subscript; there is no
rewind facilities so U-turn and revisiting an element is impossible.
Custom logic can be performed on the cells when iterating through them. The .NET framework provides a set of static methods for querying enumerable collections. For a complete list of query methods, refer to MSDN.
With the extension methods provided by System.Linq.Enumerable
, we
can use the cell selectors to manipulate data in a succinct
style. Instead of writing data processing logic in a foreach
loop,
we can use the query interfaces to extract and aggregate information
in a declarative way. For example, instead of writing:
var sum = 0;
foreach(var n in Global.LocalStorage.Node_Selector())
sum += n.val;
We can simply write:
var sum = Global.LocalStorage.Node_Selector().Sum(n=>n.val);
Or:
var sum = Global.LocalStorage.Node_Selector().Select(n=>n.val).Sum();
The code is kept away from intermediate states(e.g., the sum
variable in this example) and internal implementations. In
GE, certain query optimizations can be done automatically
by the query execution engine to leverage the indexes defined in
TSL. More specifically, it inspects the filters, extracts the
substring queries, and redirects them to proper substring query
interfaces generated by the TSL compiler. The basic rule of
expression rewriting is as follows:
-
Select
operators are not allowed to return accessors. -
For a
Where
operator, if there is an invocation ofString.Contains
on a string field of a cell and the field is indexed, the invocation sent to the inverted index module as a substring query. -
If a string container field (such as a list of strings or an array of strings) is marked as indexed, the TSL compiler will generate extension methods
ContainerType.Contains
which accepts same parameters as those onSystem.String
. Invocation of these methods are also executed as inverted index queries.
{% comment %} Note: This subsection covers some system implementation details and you can safely skip it at your first reading.
GE translates the query on a selector as an action performed over every cell of a specific type. Logically, there is no much difference from implementing the logic imperatively. However, with a certain pattern found in the query expression, GE will rewrite the expression for optimization.
Let's view a query expression as a chain S->E_1->E_2->...->E_n
,
where S
denotes a selector and E_i
denotes an query operator (a
method from System.Linq.Enumerable
). Let W_i
denote the Where
operators, and S
denote the first Select
operators in the chain.
GE will overlook all query operators after S
since after
a Select
operator the data is projected into something that is not
defined in the TSL (projecting accessor to accessor is not allowed),
and thus not available in any substring indicies defined in TSL. Now,
let W_1,...,W_m
denote all the Where
operators before S
(not
necessarily consecutive). These are all the conditional filters
applied onto the native cells(without projection into other types), so
we combine them together as W1 and W2 and ... and Wn
and regard this
expression as a whole. GE then examines the expression and
aggregates String.Contains
invocations on cell fields into a
expression tree. All the expressions under a NOT
operator are
ignored. This is because making a substring query then obtain its
compliment set would usually yield too many results to process, in
which case we would have better ignored this rewritting.
{% endcomment %}
LINQ is a
convenient way to query a data collection. The expression power of
LINQ is equivalent to those extension methods provided by the
System.Linq.Enumerable
class, only more convenient to use. The
following example demonstrates LINQ in GE versus its
imperative equivalent:
/*========================== LINQ version ==============================*/
var result = from node in Global.LocaStorage.Node_Accessor_Selector()
where node.color == Color.Red && node.degree > 5
select node.CellID.Value;
/*========================== Imperative version ========================*/
var result = Global.LocalStorage.Node_Accessor_Selector()
.Where( node => node.color == Color.Red && node.degree > 5 )
.Select( node => node.CellID.Value );
Both versions will be translated to the same binary code; the elements
in the LINQ expression will eventually be one-to-one mapped to the
imperative interfaces provided in System.Linq.Enumerable
class. But,
with LINQ we can write cleaner code. For example, if we try to write
an imperative equivalent for the following LINQ expression, a nested lambda expression must be used.
var positive_feedbacks = from user in Global.LocalStorage.User_Accessor_Selector()
from comment in user.comments
where comment.rating == Rating.Excellent
select new
{
uid = user.CellID,
pid = comment.ProductID
};
PLINQ(MSDN) is a parallel implementation of LINQ. It runs the query on multiple processors simultaneously whenever possible. Calling AsParallel() on a selector will turn it into a parallel enumerable container that works with PLINQ.
{% comment %}
However, due to the limitations (described below), using PLINQ over
cell accessors natively, so the AsParallel()
interface of cell
accessor selectors are overridden and returns a
Trinity.Linq.PLINQWrapper
that delays an unsupported PLINQ query to
the next query operator(until it's supported).
{% endcomment %}
There is a limitation of IEnumerable<T>
: IDisposable
elements are
not disposed along the enumeration. However, disposing a cell accessor
after use is crucial in GE, and a non-disposed cell
accessor will result in the target cell being locked permanantly.
This has led to the design decision made in GE, that we actively dispose a cell accessor when the user code finishes using the accessor in the enumeration loop. As a result, it is not allowed for a user to capture the value/reference of an accessor during an enumeration and store it somewhere for later use. Because the reference will be destroyed and the value will be invalidated immediately after the enumeration loop body, any operation done to the stored value/reference will cause data corruption or system crash. This is the root cause for the following limitations:
-
Select operator cannot return cell accessors, because the accessors are disposed as soon as the loop is done.
-
LINQ operators that cache elements (such as
join
,group by
) are not supported. -
PLINQ caches some elements and then distributes them to multiple cores, therefore it will not work with cell accessors. It does work with cell object selectors, though.
-
Although enumeration operation will not block the whole database, it does employ trunk-level locks. Compound LINQ selectors with join operations are not supported, because the inner loop will try to obtain the trunk lock already taken by the outer one.