diff --git a/docs/source/API/core/Graph.axpby.kokkos.graph.cpp b/docs/source/API/core/Graph.axpby.kokkos.graph.cpp
new file mode 100644
index 000000000..24cc178ac
--- /dev/null
+++ b/docs/source/API/core/Graph.axpby.kokkos.graph.cpp
@@ -0,0 +1,12 @@
+auto graph = Kokkos::Experimental::create_graph(exec_A, [&](auto root){
+    auto node_xpy = root.then_parallel_for(N, MyAxpby{x, y, alpha, beta});
+    auto node_zpy = root.then_parallel_for(N, MyAxpby{z, y, gamma, beta});
+
+    auto node_dotp = Kokkos::Experimental::when_all(node_xpy, node_zpy).then_parallel_reduce(
+        N, MyDotp{x, z}, dotp
+    )
+});
+
+graph.submit(exec_A);
+
+exec_A.fence();
diff --git a/docs/source/API/core/Graph.axpby.kokkos.graph.p2300.cpp b/docs/source/API/core/Graph.axpby.kokkos.graph.p2300.cpp
new file mode 100644
index 000000000..3d129d2a4
--- /dev/null
+++ b/docs/source/API/core/Graph.axpby.kokkos.graph.p2300.cpp
@@ -0,0 +1,15 @@
+auto graph = Kokkos::construct_graph();
+
+auto node_xpy = Kokkos::then(graph, Kokkos::parallel_for(N, MyAxpby{x, y, alpha, beta}));
+auto node_zpy = Kokkos::then(graph, Kokkos::parallel_for(N, MyAxpby{z, y, gamma, beta}));
+
+auto node_dotp = Kokkos::then(
+    Kokkos::when_all(node_xpy, node_zpy),
+    Kokkos::parallel_reduce(N, MyDotp{x, z}, dotp)
+);
+
+graph.instantiate();
+
+graph.submit(exec_A);
+
+exec_A.fence();
diff --git a/docs/source/API/core/Graph.axpby.kokkos.vanilla.cpp b/docs/source/API/core/Graph.axpby.kokkos.vanilla.cpp
new file mode 100644
index 000000000..3789ba4d7
--- /dev/null
+++ b/docs/source/API/core/Graph.axpby.kokkos.vanilla.cpp
@@ -0,0 +1,8 @@
+Kokkos::parallel_for(policy_t(exec_A, 0, N), MyAxpby{x, y, alpha, beta});
+Kokkos::parallel_for(policy_t(exec_B, 0, N), MyAxpby{z, y, gamma, beta});
+
+exec_B.fence();
+
+Kokkos::parallel_reduce(policy_t(exec_A, 0, N), MyDotp{x, z}, dotp);
+
+exec_A.fence();
diff --git a/docs/source/API/core/Graph.rst b/docs/source/API/core/Graph.rst
index 1a9f3df6a..42beb0860 100644
--- a/docs/source/API/core/Graph.rst
+++ b/docs/source/API/core/Graph.rst
@@ -4,10 +4,10 @@ Graph and related
 Usage
 -----
 
-:code:`Kokkos::Graph` is an abstraction that can be used to define a group of asynchronous workloads that are organised as a direct acyclic graph.
-A :code:`Kokkos::Graph` is defined separatly from its execution, allowing it to be re-executed multiple times.
+:cppkokkos:`Kokkos::Graph` is an abstraction that can be used to define a group of asynchronous workloads that are organised as a direct acyclic graph.
+A :cppkokkos:`Kokkos::Graph` is defined separatly from its execution, allowing it to be re-executed multiple times.
 
-:code:`Kokkos::Graph` is a powerful way of describing workload dependencies. It is also a good opportunity to present all workloads
+:cppkokkos:`Kokkos::Graph` is a powerful way of describing workload dependencies. It is also a good opportunity to present all workloads
 at once to the driver, and allow some optimizations [ref].
 
 .. note::
@@ -16,18 +16,18 @@ at once to the driver, and allow some optimizations [ref].
 
 For small workloads that need to be sumitted several times, it might save you some overhead [reference to some presentation / paper].
 
-:code:`Kokkos::Graph` is specialized for some backends:
+:cppkokkos:`Kokkos::Graph` is specialized for some backends:
 
-* :code:`Cuda`: [ref to vendor doc]
-* :code:`HIP`: [ref to vendor doc]
-* :code:`SYCL`: [ref to vendor doc] -> https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
+* :cppkokkos:`Cuda`: [ref to vendor doc]
+* :cppkokkos:`HIP`: [ref to vendor doc]
+* :cppkokkos:`SYCL`: [ref to vendor doc] -> https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
 
 For other backends, Kokkos provides a defaulted implementation [ref to file].
 
 Philosophy
 ----------
 
-As mentioned earlier, the :code:`Kokkos::Graph` is first defined, and then executed. In fact, before the graph can be executed,
+As mentioned earlier, the :cppkokkos:`Kokkos::Graph` is first defined, and then executed. In fact, before the graph can be executed,
 it needs to be *instantiated*.
 
 During the *instantiation* phase, the topology of the graph is **locked**, and an *executable graph* is created.
@@ -40,53 +40,23 @@ In short, we have 3 phases:
 
 "Splitting command construction from execution is a proven solution." (https://www.iwocl.org/wp-content/uploads/iwocl-2023-Ewan-Crawford-4608.pdf)
 
-Basic example
--------------
-
-This example showcases how three workloads can be organised as a :code:`Kokkos::Graph`.
-
-Workloads A and B are independent, but workload C needs the completion of A and B.
-
-.. code-block:: cpp
-
-    int main()
-    {
-        auto graph = Kokkos::Experimental::create_graph<Exec>([&](auto root) {
-            const auto node_A = root.then_parallel_for(...label..., ...policy..., ...body...);
-            const auto node_B = root.then_parallel_for(...label..., ...policy..., ...body...);
-            const auto ready  = Kokkos::Experimental::when_all(node_A, node_B);
-            const auto node_C = ready.then_parallel_for(...label..., ...policy..., ...body...);
-        });
-
-        for(int irep = 0; irep < nrep; ++irep)
-            graph.submit();
-    }
-
-Advanced example
-----------------
-
-To be done soon.
-
-References
-----------
-
-* https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
-* https://github.com/intel/llvm/blob/sycl/sycl/doc/syclgraph/SYCLGraphUsageGuide.md
-* https://developer.nvidia.com/blog/a-guide-to-cuda-graphs-in-gromacs-2023/
-
-
 Use cases
 ---------
 
 Diamond with closure, don't care about `exec`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Create a simple diamond-like graph within a closure, no caring about execution space instances.
+Create a simple diamond-like graph within a closure, not caring too much about execution space instances.
 
 This use case demonstrates how a graph can be created from inside a closure, and how it could look like in the future.
 It is a very simple use case.
 
-Note that I'm not sure why we should support the closure anyway.
+.. note::
+
+    I'm not sure why we should support the closure anyway. I don't see the benefits of enforcing the
+    user to create the whole graph in there.
+
+    See :ref:`no_root_node` for discussion.
 
 .. graphviz::
     :caption: Diamond topology
@@ -99,9 +69,9 @@ Note that I'm not sure why we should support the closure anyway.
     }
 
 .. code-block:: c++
-    :caption: Current pseudo-code
+    :caption: Current `Kokkos` pseudo-code.
 
-    auto graph = Kokkos::create_graph([&](const auto& root){
+    auto graph = Kokkos::create_graph([&](auto root){
         auto node_A = root.then_parallel_...(...label..., ...policy..., ...functor...);
 
         auto node_B = node_A.then_parallel_...(...label..., ...policy..., ...functor...);
@@ -113,9 +83,9 @@ Note that I'm not sure why we should support the closure anyway.
     graph.submit()
 
 .. code-block:: c++
-    :caption: P2300 (but really I don't like that because `graph` itself is already a *sender*)
+    :caption: *à la* P2300 (but really I don't like that because `graph` itself is already a *sender*).
 
-    auto graph = Kokkos::create_graph([&](const auto& root){
+    auto graph = Kokkos::create_graph([&](auto root){
         auto node_A = then(root, parallel_...(...label..., ...policy..., ...functor...));
 
         auto node_B = then(node_A, parallel_...(...label..., ...policy..., ...functor...));
@@ -129,7 +99,7 @@ Note that I'm not sure why we should support the closure anyway.
 Diamond, caring about `exec`
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Create a simple diamond-like graph, caring about execution space instances.
+Create a simple diamond-like graph, caring about execution space instances. No closure.
 
 This use case demonstrates how a graph can be created without a closure, and how it could look like in the future.
 It also focuses on where steps occur.
@@ -147,9 +117,9 @@ Graph topology is known at compile, thus enabling a lot of optimizations (kernel
     }
 
 .. code-block:: c++
-    :caption: Current pseudo-code
+    :caption: Current `Kokkos` pseudo-code.
 
-    auto graph = Kokkos::create_graph(exec_A, [&](const auto& root){});
+    auto graph = Kokkos::create_graph(exec_A, [&](auto root){});
     auto root  = Kokkos::Impl::GraphAccess::create_root_node_ref(graph);
 
     auto node_A = root.then_parallel_...(...label..., ...policy..., ...functor...);
@@ -161,19 +131,17 @@ Graph topology is known at compile, thus enabling a lot of optimizations (kernel
 
     graph.instantiate();
     exec_A.fence("The graph might make some async to-device copies.");
+
     graph.submit(exec_B);
 
 .. code-block:: c++
-    :caption: P2300 + defer when Kokkos performs internal async to-device copies
+    :caption: *à la* P2300 and defer when `Kokkos` performs internal async to-device copies to the `instantiate` step.
 
-    // Step 1: define topology (no execution space instance required)
+    // Step 1: define graph topology (note that no execution space instance required).
     auto graph = Kokkos::create_graph<execution_space>();
 
     auto node_A = then(graph, parallel_...(...label..., ...policy..., ...functor...));
 
-    // what happens to an exec space instance passed to the policy ? is it used somehow or just ignored ?
-    // when dispatching the driver to global memory, what exec space instance is used for the async copies ?
-
     auto node_B = then(node_A, parallel_...(...label..., ...policy..., ...functor...));
     auto node_C = then(node_A, parallel_...(...label..., ...policy..., ...functor...));
 
@@ -186,15 +154,17 @@ Graph topology is known at compile, thus enabling a lot of optimizations (kernel
     // Step 3: execute
     graph.submit(exec_B)
 
-No "root" node
-~~~~~~~~~~~~~~
+.. _no_root_node:
 
-Currently, the :code:`Kokkos::Graph` would expose to the user a "root node" concept that is not needed
+To root or not to root ?
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently, the :cppkokkos:`Kokkos::Graph` API would expose to the user a "root node" concept that is not strictly needed
 by any backend (but might be needed by the default implementation that works with *sinks*).
 
-The "root node" might be confusing. It sould not appear in the API for 2 reasons:
+I think the "root node" might be confusing. IMO, it should not appear in the API for 2 reasons:
 
-1. It can be misleading, as the user might think it's necessary though I think it's an artifact of how :code:`Kokkos::Graph`
+1. It can be misleading, as the user might think it's necessary though I think it's an artifact of how :cppkokkos:`Kokkos::Graph`
    is currently implemented for graph construction, and because of the *sink*-based defaulted implementation.
 2. With P2300, it's clear that *root* is an empty useless sender that can be thrown away at compile time.
 
@@ -208,15 +178,15 @@ The "root node" might be confusing. It sould not appear in the API for 2 reasons
     }
 
 .. code-block:: c++
-    :caption: P2300
+    :caption: *à la* P2300.
 
-    auto graph = construct_graph();
+    auto graph = Kokkos::construct_graph();
 
-    auto A1 = then(graph, ...);
-    auto A2 = then(graph, ...);
-    auto A3 = then(graph, ...);
+    auto A1 = Kokkos::then(graph, Kokkos::parallel_...(...));
+    auto A2 = Kokkos::then(graph, Kokkos::parallel_...(...));
+    auto A3 = Kokkos::then(graph, Kokkos::parallel_...(...));
 
-    auto B = then(when_all(A1, A2, A3), ...);
+    auto B = Kokkos::then(Kokkos::when_all(A1, A2, A3), Kokkos::parallel_...(...));
 
 Complex DAG topology
 ~~~~~~~~~~~~~~~~~~~~
@@ -234,13 +204,13 @@ Any complex-but-valid DAG topology should work.
         A2 -> B1;
         A2 -> B3;
         A3 -> B4;
-        
+
         B1 -> C1;
         B3 -> C1;
-        
+
         B2 -> C2;
         B4 -> C2;
-        
+
         // Enfore ordering of nodes with invisible edges.
         {
             rank = same;
@@ -255,59 +225,58 @@ Changing scheduler
 
 This is the purpose of PR https://github.com/kokkos/kokkos/pull/7249, and should be further documented.
 
-Towards https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2300r10.html#design-sender-adaptor-starts_on.
+This is a step towards https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2300r10.html#design-sender-adaptor-starts_on.
 
 .. code-block:: c++
+    :caption: *à la* P2300.
 
-    auto graph = construct()
-
-    auto node_1 = ...
+    // Step 1: construct.
+    auto graph = Kokkos::construct_graph();
 
+    auto node_1 = Kokkos::then(graph, ...);
     ...
 
+    // Step 2: instantiate.
     graph.instantiate();
 
+    // Step 3: execute, execute, and again.
     graph.submit(exec_A);
-
     ...
-
     graph.submit(exec_C);
-
     ...
-
     graph.submit(exec_D);
 
 Interoperability
 ~~~~~~~~~~~~~~~~
 
-Why interoperability matters (helps adoption of :code:`Kokkos::Graph`, extensibility, corner cases):
+Why interoperability matters (helps adoption of :cppkokkos:`Kokkos::Graph`, extensibility, corner cases):
 
-1. Attract users that already use some backend graph (*e.g.* `cudaGraph_t`) towards `Kokkos`. It helps them transition smoothly.
-2. Help user integrate backend-specific graph capabilities that are not part of the :code:`Kokkos::Graph` API for whatever reason.
+1. Attract users that already use some backend graph (*e.g.* :code:`cudaGraph_t`) towards `Kokkos`. It helps them transition smoothly.
+2. Help user integrate backend-specific graph capabilities that are not part of the :cppkokkos:`Kokkos::Graph` API for whatever reason.
 
 Since `Kokkos` might run some stuff linked to its internals at *instantiation* stage, and since in PR https://github.com/kokkos/kokkos/pull/7240
 we decided to ensure that before the submission, the graph needs to be instantiated in `Kokkos`, interoperability implies that the user
-passes through `Kokkos` for both *instantiation* and *submission*.
+relies on `Kokkos` for both *instantiation* and *submission*.
 
 .. graphviz::
-    :caption: Dark nodes/edges are added through :code:`Kokkos::Graph`.
+    :caption: Dark nodes/edges are added through :cppkokkos:`Kokkos::Graph` API, the rest is pre-existing.
 
     digraph interoperability {
 
         A[color=darksalmon];
-        
+
         B1[color=darksalmon];
         B2[color=darksalmon];
         B3[color=darksalmon];
-        
+
         C3[color=darksalmon];
 
         A -> B1[color=darksalmon];
         A -> B2[color=darksalmon];
         A -> B3[color=darksalmon];
-        
+
         B3 -> C3[color=darksalmon];
-        
+
         // Enfore ordering of nodes with invisible edges.
         {
             rank = same;
@@ -315,50 +284,102 @@ passes through `Kokkos` for both *instantiation* and *submission*.
             B1 -> B2 -> B3 ;
             rankdir = LR;
         }
-        
+
         B1 -> C1;
         B2 -> C1;
-        
+
         C1 -> D1;
         C3 -> D1;
-    } 
+    }
 
 .. code-block:: c++
-    :caption: interoperability pseudo-code P2300
+    :caption: Interoperability pseudo-code *à la* P2300.
 
+    // The user starts creating its graph with a backend API for some reason.
     cudaGraph_t graph;
     cudaGraphCreate(&graph, ...);
 
     cudaGraphNode_t A, B1, B2, B3, C3;
     ... create kernel nodes and add dependencies ...
 
-    auto kokkos_graph = construct(graph);
+    // But at some point wants interoperability with Kokkos.
+    auto kokkos_graph = Kokkos::construct_graph(graph);
 
-    auto C1 = then(when_all(B1, B2), ...);
-    auto D1 = then(when_all(C1, C3), ...);
+    auto C1 = Kokkos::then(Kokkos::when_all(B1, B2), ...);
+    auto D1 = Kokkos::then(Kokkos::when_all(C1, C3), ...);
 
+    // The user is now bound to Kokkos for instantiation and submission.
     kokkos_graph.instantiate();
     kokkos_graph.submit();
 
 Graph update
 ~~~~~~~~~~~~
 
-From reading `Cuda`, `HIP` and `SYCL` documentations, all have some *executable graph update* mechanisms.
+From reading :cppkokkos:`Cuda`, :cppkokkos:`HIP` and :cppkokkos:`SYCL` documentations, all have some *executable graph update* mechanisms.
 
-For instance, disabling a node from host (:code:`hipGraphNodeSetEnabled`, not in `HIP` yet) can support complex graphs that might slightly change from one submission to another.
+For instance, disabling a node from host (:code:`hipGraphNodeSetEnabled`) can support complex graphs that might slightly change from one submission to another.
 
     Updates to a graph will be scheduled after any in-flight executions of the same graph and will not affect previous submissions of the same graph.
     The user is not required to wait on any previous submissions of a graph before updating it.
 
-As the topology is fixed, we can only reasonably update kernel parameters.
+As the topology is fixed, we can only reasonably update kernel parameters or skip a node.
+
+.. graphviz::
+    :caption: Some iterative loop that needs to seed under some condition (to be enhanced).
+
+    digraph graph_update {
+
+        S[label="start", shape=diamond];
+
+        A[label="seed"];
+        B[label="compute"];
+        C[label="solve"];
+        
+        S -> A[color=green];
+        
+        A -> B[color=green];
+        
+        B -> C;
+        
+        C -> S;
+        
+        S -> B[color="red"];
+
+    }
+
+Iterative processes
+~~~~~~~~~~~~~~~~~~~
 
-Iterative process
------------------
+Plenty of opportunities for :cppkokkos:`Kokkos::Graph` to lean in:
 
-- iterative solver (our assembly case)
+- iterative solver
 - line search in optimization
+- you name it
+
+Let's take the `AXPBY` micro-benchmark from https://hihat.opencommons.org/images/1/1a/Kokkos-graphs-presentation.pdf:
+
+.. graphviz::
+    :caption: Two `AXPBY` followed by a dot product.
+
+    digraph axpby {
+        A[label="axpby"];
+        B[label="axpby"];
+        C[label="dotp"];
+        A->C;
+        B->C;
+    }
+
+.. literalinclude:: Graph.axpby.kokkos.vanilla.cpp
+    :language: c++
+    :caption: Vanilla `Kokkos`.
 
+.. literalinclude:: Graph.axpby.kokkos.graph.cpp
+    :language: c++
+    :caption: Current :cppkokkos:`Kokkos::Graph`.
 
+.. literalinclude:: Graph.axpby.kokkos.graph.p2300.cpp
+    :language: c++
+    :caption: *à la* P2300.
 
 They also use graphs...
 -----------------------
@@ -366,11 +387,29 @@ They also use graphs...
 * `PyTorch` https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/
 * `GROMACS` https://developer.nvidia.com/blog/a-guide-to-cuda-graphs-in-gromacs-2023/
 
+Design choices
+--------------
+
+Questions we need to answer before going further in the :cppkokkos:`Graph` refactor.
+
+Dispatching
+~~~~~~~~~~~
 
-Homework
+- Do we allow node policies to have a user-provided execution space instance ?
+- When does `Kokkos` makes its to-device dispatching (*e.g.* to global memory) ?
 
-- what does Kokkos during dispatching ? (HIP CUDA SYCL) Execution space instance from the policy, used or ignored ?
-- for each example 3 columns how to write it in CUDA SYCL P2300 Kokkos
-- développer l'update
-- essayer de démontrer qu'on peut écrire un seul code, et dire si on veut que ce soit un graph ou pas
-  (why it matters: write single source code , kokkos premise 'single source code')
\ No newline at end of file
+Write a single source code, but allow skipping backend graph
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We should be able to write a single source code and decide if we want the graph to map to the backend graph or just
+execute nodes.
+
+This would greatly benefit adoption, and respect `Kokkos` single source code promise.
+
+References
+----------
+
+* https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
+* https://github.com/intel/llvm/blob/sycl/sycl/doc/syclgraph/SYCLGraphUsageGuide.md
+* https://developer.nvidia.com/blog/a-guide-to-cuda-graphs-in-gromacs-2023/
+* https://hihat.opencommons.org/images/1/1a/Kokkos-graphs-presentation.pdf