todo

meta should be bytes, not an integer.


use the batched version of hash_points.
* in store:verified.
* in verify.


for multiplication, we could use the faster version of the doubling and adding algorithms, because Z1 is always equal to 1.


in stem2:serialize, we could batch this to make it faster.
%look into serialize_stems, there is some commented out code that should be a faster version.
batched version of stem:put.


12% (or 24%?) speedup.
in verify2:update, we are taking the hash of stems. This involves changing an extended point into an affine point, which is slow, and should be done in batches if possible.
We can do it in batches here.
update_batch2 and update_merge are currently doing a depth first update of the proof tree. We need to switch to breadth first to fully take advantage of batching.
(batch chunkify is taking up like 20% for comparison, and clump by path is 15%)
* in verify2:update_merge, we are only calculating the stem2:hash_point of an extended point in one place. instead of immediately calculating it, we should pass the variables back to the calling function instead of the diff. the calling function can calculate the diff later when it is convenient. 


which parts could be re-written in c to make it fast.

* for storage. store:pme2 + secp256k1:me3. We want to pass in a list of points, and have it add them all up fast.
 - pme2 accepts a matrix as input, it is looking up points from a precompute, and then summing them all. the precompute is powers of the generator points.
 - me3 is alternatively adding, and then (doubling the accumulated amount 8 times).

* for proving. multiproof:calc_G_e. it is important not to list the entire As, it has a lot of repetitions. Have a compact non-repeating list of As, and a seperate list of pointers to show which of these corresponds to which Zs/Y combination.
parameters:div_e can be made shorter if we are willing to do a few extra finite field multiplications. This will be a good investment because then the parameters fit into the L1 cache.
involves a polynomial addition, multiplying the polynomials by scalars, multiplying precompute vectors together.

* for verifying: secp256k1:multi_exponent. The bucket method. involves adding up lots of generator points, then doing me3.
  - we specifically want to implement the calculation of Ss. this is very similar to pme2 from storage, but instead of looking up precomputed powers of generator points, we do that in step bucketify2/4

libraries we need in a fast language:

* group order polynomials
  - vector addition.
  - vector multiplied by scalar.
  - div_e, looking up constants and lots of multiplication.


we can make inner product arguments around 2x faster by using the same pre-compute strategy that we use for storage. Store powers of the base point, so we can skip that step of the bucket strategy.


maybe stems should store their data as lists, because in store.erl we only access the data in order.
This could potentially make storage about 1 second faster.
but it looks like it would make it slower to look up data.