ReferenceCounting.tex

\section{Reference Counting}
\label{reference-counting}

Actors systems can span complex communication graphs that make it hard to decide when actors are no longer needed. As a result, manually managing lifetime of actors is merely impossible. For this reason, \lib implements a garbage collection strategy for actors based on weak and strong reference counts.

\subsection{Shared Ownership in C++}

The C++ standard library already offers \lstinline^shared_ptr^ and \lstinline^weak_ptr^ to manage objects with complex shared ownership. The standard implementation is a solid general purpose design that covers most use cases. Weak and strong references to an object are stored in a \emph{control block}. However, \lib uses a slightly different design. The reason for this is twofold. First, we need the control block to store the identity of an actor. Second, we wanted a design that requires less indirections, because actor handles are used extensively copied for messaging, and this overhead adds up.

Before discussing the approach to shared ownership in \lib, we look at the design of shared pointers in the C++ standard library.

\singlefig{shared_ptr}{Shared pointer design in the C++ standard library}{shared-ptr}

The figure above depicts the default memory layout when using shared pointers. The control block is allocated separately from the data and thus stores a pointer to the data. This is when using manually-allocated objects, for example \lstinline^shared_ptr<int> iptr{new int}^. The benefit of this design is that one can destroy \lstinline^T^ independently from its control block. While irrelevant for small objects, it can become an issue for large objects. Notably, the shared pointer stores two pointers internally. Otherwise, dereferencing it would require to get the data location from the control block first.

\singlefig{make_shared}{Memory layout when using \lstinline^std::make_shared^}{make-shared}

When using \lstinline^make_shared^ or \lstinline^allocate_shared^, the standard library can store reference count and data in a single memory block as shown above. However, \lstinline^shared_ptr^ still has to store two pointers, because it is unaware where the data is allocated.

\singlefig{enable_shared_from_this}{Memory layout with \lstinline^std::enable_shared_from_this^}{enable-shared-from-this}

Finally, the design of the standard library becomes convoluted when an object should be able to hand out a \lstinline^shared_ptr^ to itself. Classes must inherit from \lstinline^std::enable_shared_from_this^ to navigate from an object to its control block. This additional navigation path is required, because \lstinline^std::shared_ptr^ needs two pointers. One to the data and one to the control block. Programmers can still use \lstinline^make_shared^ for such objects, in which case the object is again stored along with the control block.

\subsection{Smart Pointers to Actors}

In \lib, we use a different approach than the standard library because (1) we always allocate actors along with their control block, (2) we need additional information in the control block, and (3) we can store only a single raw pointer internally instead of the two raw pointers \lstinline^std::shared_ptr^ needs. The following figure summarizes the design of smart pointers to actors.

\singlefig{refcounting}{Shared pointer design in \lib}{actor-pointer}

\lib uses \lstinline^strong_actor_ptr^ instead of \lstinline^std::shared_ptr<...>^ and \lstinline^weak_actor_ptr^ instead of \lstinline^std::weak_ptr<...>^. Unlike the counterparts from the standard library, both smart pointer types only store a single pointer.

Also, the control block in \lib is not a template and stores the identity of an actor (\lstinline^actor_id^ plus \lstinline^node_id^). This allows \lib to access this information even after an actor died. The control block fits exactly into a single cache line (64 Bytes). This makes sure no \emph{false sharing} occurs between an actor and other actors that have references to it. Since the size of the control block is fixed and \lib \emph{guarantees} the memory layout enforced by \lstinline^actor_storage^, \lib can compute the address of an actor from the pointer to its control block by offsetting it by 64 Bytes. Likewise, an actor can compute the address of its control block.

The smart pointer design in \lib relies on a few assumptions about actor types. Most notably, the actor object is placed 64 Bytes after the control block. This starting address is cast to \lstinline^abstract_actor*^. Hence, \lstinline^T*^ must be convertible to \lstinline^abstract_actor*^ via \lstinline^reinterpret_cast^. In practice, this means actor subclasses must not use virtual inheritance, which is enforced in \lib with a \lstinline^static_assert^.

\subsection{Strong and Weak References}

A \emph{strong} reference manipulates the \lstinline^strong refs^ counter as shown above. An actor is destroyed if there are \emph{zero} strong references to it. If two actors keep strong references to each other via member variable, neither actor can ever be destroyed because they produce a cycle \see{breaking-cycles}. Strong references are formed by \lstinline^strong_actor_ptr^, \lstinline^actor^, and \lstinline^typed_actor<...>^ \see{actor-reference}.

A \emph{weak} reference manipulates the \lstinline^weak refs^ counter. This counter keeps track of how many references to the control block exist. The control block is destroyed if there are \emph{zero} weak references to an actor (which cannot occur before \lstinline^strong refs^ reached \emph{zero} as well). No cycle occurs if two actors keep weak references to each other, because the actor objects themselves can get destroyed independently from their control block.  A weak reference is only formed by \lstinline^actor_addr^ \see{actor-address}.

\subsection{Converting Actor References with \texttt{actor\_cast}}
\label{actor-cast}

The function \lstinline^actor_cast^ converts between actor pointers and handles. The first common use case is to convert a \lstinline^strong_actor_ptr^ to either \lstinline^actor^ or \lstinline^typed_actor<...>^ before being able to send messages to an actor. The second common use case is to convert \lstinline^actor_addr^ to \lstinline^strong_actor_ptr^ to upgrade a weak reference to a strong reference. Note that casting \lstinline^actor_addr^ to a strong actor pointer or handle can result in invalid handles. The syntax for \lstinline^actor_cast^ resembles builtin C++ casts. For example, \lstinline^actor_cast<actor>(x)^ converts \lstinline^x^ to an handle of type \lstinline^actor^.

\subsection{Breaking Cycles Manually}
\label{breaking-cycles}

Cycles can occur only when using class-based actors when storing references to other actors via member variable. Stateful actors \see{stateful-actor} break cycles by destroying the state when an actor terminates, \emph{before} the destructor of the actor itself runs. This means an actor releases all references to others automatically after calling \lstinline^quit^. However, class-based actors have to break cycles manually, because references to others are not released until the destructor of an actor runs. Two actors storing references to each other via member variable produce a cycle and neither destructor can ever be called.

Class-based actors can break cycles manually by overriding \lstinline^on_exit()^ and calling \lstinline^destroy(x)^ on each handle~\see{actor-handle}. Using a handle after destroying it is undefined behavior, but it is safe to assign a new value to the handle.

%TODO: Add use case for the following casting scenario.
%There is one requirement of this design: `static_cast<abstract_actor*>(self)` must return the same pointer as `reinterpret_cast<abstract_actor*>(self)` for any actor `self`. Otherwise, our assumption that the actor object starts exactly 64 Bytes after its control block would break. Luckily, this boils down to a single limitation in practice: User-defined actors must not use virtual inheritance. When trying to spawn actors that do make use of virtual inheritance, \lib generates a compile-time error: `"actor subtype has illegal memory alignment (probably due to virtual inheritance)"`.